Rao-Blackwellization

Sneiderman, Robby

Sampling MCMC

Rao-Blackwellization

The Rao-Blackwell theorem: conditioning an estimator on a sufficient statistic reduces variance without increasing bias. In MCMC, this means replacing sample averages with conditional expectations for lower-variance estimates at no extra sampling cost.

CoreTier 2StableSupporting~40 min

Prerequisites

Sufficient Statistics and Exponential Families Importance Sampling

Start 8-question practice · 6 available 3-question pulse check Prereq Map

Learning position

Read this page in the graph.

sampling-mcmc | layer 2 | tier 2. This page has 2 direct prerequisites and 0 published dependents.

Open Atlas Prerequisites Leads to

What next

Take the diagnostic

No published continuation is declared yet, so the diagnostic is the clean next route.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

You have an unbiased estimator from your MCMC sampler. The Rao-Blackwell theorem says you can often get a strictly better estimator, one with lower variance and the same bias (zero), by conditioning on a sufficient statistic. The improvement is free: no additional samples needed, no tuning parameters, no approximations.

In MCMC, Rao-Blackwellization is one of the most reliable variance reduction techniques. Whenever your sampler explores a joint space $(x, z)$ but you only care about a function of $x$ , you can analytically integrate out $z$ to get a lower-variance estimate.

Mental Model

Imagine estimating the mean of a die roll. You could roll and report the result (high variance), or you could use the fact that conditional on knowing whether the roll is odd or even, you know the conditional expectation exactly. Rao-Blackwellization replaces a noisy estimator with its conditional expectation given some information you already have. The law of total variance guarantees this always reduces variance.

Formal Setup and Notation

Let $\theta$ be a parameter, $X$ data, $T(X)$ a sufficient statistic for $\theta$ , and $\delta(X)$ an unbiased estimator of $g(\theta)$ .

Main Theorems

Theorem

Rao-Blackwell Theorem

Statement

Let $\delta(X)$ be an unbiased estimator of $g(\theta)$ and let $T(X)$ be a sufficient statistic for $\theta$ . Define the Rao-Blackwellized estimator:

$\delta^*(X) = \mathbb{E}[\delta(X) \mid T(X)]$

Then:

$\delta^*$ is unbiased: $\mathbb{E}[\delta^*] = g(\theta)$
$\delta^*$ has no greater variance: $\text{Var}(\delta^*) \leq \text{Var}(\delta)$
$\delta^*$ depends on $X$ only through $T(X)$

The inequality is strict unless $\delta$ is already a function of $T$ .

Intuition

Conditioning on a sufficient statistic averages out the noise in $\delta$ that is unrelated to the parameter. The sufficient statistic captures all the information about $\theta$ , so conditioning on it preserves the mean while removing extraneous randomness.

Proof Sketch

Unbiasedness: $\mathbb{E}[\delta^*] = \mathbb{E}[\mathbb{E}[\delta \mid T]] = \mathbb{E}[\delta] = g(\theta)$ by the tower property. Variance reduction: by the law of total variance, $\text{Var}(\delta) = \text{Var}(\mathbb{E}[\delta \mid T]) + \mathbb{E}[\text{Var}(\delta \mid T)] = \text{Var}(\delta^*) + \mathbb{E}[\text{Var}(\delta \mid T)]$ . Since $\mathbb{E}[\text{Var}(\delta \mid T)] \geq 0$ , we get $\text{Var}(\delta) \geq \text{Var}(\delta^*)$ .

Why It Matters

This theorem provides a systematic recipe for improving estimators. Start with any unbiased estimator, find a sufficient statistic, and condition. Combined with the Lehmann-Scheffe theorem, this path leads to uniformly minimum variance unbiased estimators (UMVUE) when a complete sufficient statistic exists.

Failure Mode

You need to be able to compute the conditional expectation $\mathbb{E}[\delta \mid T]$ analytically. If this is intractable, you cannot Rao-Blackwellize. In complex models, the conditional expectation may itself require an approximation, partially defeating the purpose.

report a correction →

Rao-Blackwellization in MCMC

Definition

Rao-Blackwellization for MCMC

Suppose you run an MCMC sampler on a joint space $(X, Z)$ and want to estimate $\mathbb{E}[h(X)]$ . The naive estimator averages $h(X^{(t)})$ over MCMC samples. The Rao-Blackwellized estimator instead uses:

$\hat\mu_{\text{RB}} = \frac{1}{T} \sum_{t=1}^{T} \mathbb{E}[h(X) \mid Z^{(t)}]$

If the conditional expectation $\mathbb{E}[h(X) \mid Z]$ can be computed in closed form, this estimator has lower variance than the naive one.

In Gibbs sampling, you often sample $(X, Z)$ by alternating between the full conditionals. If you can compute $\mathbb{E}[h(X) \mid Z = z]$ analytically, use it instead of the raw $X$ samples.

Canonical Examples

Example

Rao-Blackwellization in a normal-normal model

Suppose $Z \sim N(0, 1)$ and $X \mid Z \sim N(Z, 1)$ , and you want $\mathbb{E}[X]$ . A Gibbs sampler produces pairs $(X^{(t)}, Z^{(t)})$ . The naive estimate is $\frac{1}{T}\sum_t X^{(t)}$ . The Rao-Blackwellized estimate is $\frac{1}{T}\sum_t \mathbb{E}[X \mid Z^{(t)}] = \frac{1}{T}\sum_t Z^{(t)}$ . The second estimator has variance from only the $Z$ chain, removing the conditional variance of $X \mid Z$ .

Example

Integrating out discrete latent variables

In a mixture model with latent cluster assignments $Z_i$ , you want $\mathbb{E}[f(\theta) \mid \text{data}]$ . Instead of using the sampled $\theta^{(t)}$ directly, compute $\mathbb{E}[f(\theta) \mid Z^{(t)}, \text{data}]$ by integrating over $\theta$ given the cluster assignments. If $\theta$ has a conjugate posterior given $Z$ , this integral is analytic.

Common Confusions

Watch Out

Rao-Blackwellization does not require a sufficient statistic in MCMC

The classical theorem uses sufficient statistics. In MCMC, the term is used more loosely: any analytical integration over part of the state space that reduces variance is called Rao-Blackwellization. The variance reduction follows from the law of total variance regardless of sufficiency.

Watch Out

Rao-Blackwellization is not always worth the effort

If the conditional expectation $\mathbb{E}[h(X) \mid Z]$ is expensive to compute (e.g., requires summing over exponentially many states), the computational cost may outweigh the variance reduction. The technique is most valuable when the conditional expectation has a closed form.

Summary

Conditioning on more information never increases variance (law of total variance)
The Rao-Blackwell theorem turns this into a recipe: condition on sufficient statistics
In MCMC: analytically integrate out part of the state for free variance reduction
Combined with Lehmann-Scheffe, Rao-Blackwell leads to UMVUE
Only works when the conditional expectation is tractable

Exercises

ExerciseCore

Problem

Let $X_1, \ldots, X_n \sim \text{Bernoulli}(p)$ . The estimator $\delta = X_1$ is unbiased for $p$ . Rao-Blackwellize it using the sufficient statistic $T = \sum_{i=1}^n X_i$ and show the result is $\bar{X}$ .

ExerciseAdvanced

Problem

In a Gibbs sampler for a hierarchical model with latent variables $Z$ and parameters $\theta$ , you want to estimate $\mathbb{E}[\theta \mid \text{data}]$ . Explain when Rao-Blackwellizing over $Z$ (using $\mathbb{E}[\theta \mid Z^{(t)}, \text{data}]$ ) would give large variance reduction and when it would give negligible improvement.

References

Canonical:

Lehmann & Casella, Theory of Point Estimation (1998), Chapter 1.8
Casella & Robert, Monte Carlo Statistical Methods (2004), Chapter 4

Current:

Robert & Casella, Introducing Monte Carlo Methods with R (2010), Chapter 5
Gelman et al., Bayesian Data Analysis (2013), Chapters 10-12
Brooks et al., Handbook of MCMC (2011), Chapters 1-5

Next Topics

Natural extensions from Rao-Blackwellization:

Variance reduction techniques: control variates, antithetic variables
Sufficient statistics and exponential families: the foundation for analytical conditional expectations

Last reviewed: April 13, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

2

Importance Samplinglayer 2 · tier 1
Sufficient Statistics and Exponential Familieslayer 0B · tier 2

Derived topics

0

No published topic currently declares this as a prerequisite.