Sampling MCMC
Rao-Blackwellization
The Rao-Blackwell theorem: conditioning an estimator on a sufficient statistic reduces variance without increasing bias. In MCMC, this means replacing sample averages with conditional expectations for lower-variance estimates at no extra sampling cost.
Why This Matters
You have an unbiased estimator from your MCMC sampler. The Rao-Blackwell theorem says you can often get a strictly better estimator, one with lower variance and the same bias (zero), by conditioning on a sufficient statistic. The improvement is free: no additional samples needed, no tuning parameters, no approximations.
In MCMC, Rao-Blackwellization is one of the most reliable variance reduction techniques. Whenever your sampler explores a joint space but you only care about a function of , you can analytically integrate out to get a lower-variance estimate.
Mental Model
Imagine estimating the mean of a die roll. You could roll and report the result (high variance), or you could use the fact that conditional on knowing whether the roll is odd or even, you know the conditional expectation exactly. Rao-Blackwellization replaces a noisy estimator with its conditional expectation given some information you already have. The law of total variance guarantees this always reduces variance.
Formal Setup and Notation
Let be a parameter, data, a sufficient statistic for , and an unbiased estimator of .
Main Theorems
Rao-Blackwell Theorem
Statement
Let be an unbiased estimator of and let be a sufficient statistic for . Define the Rao-Blackwellized estimator:
Then:
- is unbiased:
- has no greater variance:
- depends on only through
The inequality is strict unless is already a function of .
Intuition
Conditioning on a sufficient statistic averages out the noise in that is unrelated to the parameter. The sufficient statistic captures all the information about , so conditioning on it preserves the mean while removing extraneous randomness.
Proof Sketch
Unbiasedness: by the tower property. Variance reduction: by the law of total variance, . Since , we get .
Why It Matters
This theorem provides a systematic recipe for improving estimators. Start with any unbiased estimator, find a sufficient statistic, and condition. Combined with the Lehmann-Scheffe theorem, this path leads to uniformly minimum variance unbiased estimators (UMVUE) when a complete sufficient statistic exists.
Failure Mode
You need to be able to compute the conditional expectation analytically. If this is intractable, you cannot Rao-Blackwellize. In complex models, the conditional expectation may itself require an approximation, partially defeating the purpose.
Rao-Blackwellization in MCMC
Rao-Blackwellization for MCMC
Suppose you run an MCMC sampler on a joint space and want to estimate . The naive estimator averages over MCMC samples. The Rao-Blackwellized estimator instead uses:
If the conditional expectation can be computed in closed form, this estimator has lower variance than the naive one.
In Gibbs sampling, you often sample by alternating between the full conditionals. If you can compute analytically, use it instead of the raw samples.
Canonical Examples
Rao-Blackwellization in a normal-normal model
Suppose and , and you want . A Gibbs sampler produces pairs . The naive estimate is . The Rao-Blackwellized estimate is . The second estimator has variance from only the chain, removing the conditional variance of .
Integrating out discrete latent variables
In a mixture model with latent cluster assignments , you want . Instead of using the sampled directly, compute by integrating over given the cluster assignments. If has a conjugate posterior given , this integral is analytic.
Common Confusions
Rao-Blackwellization does not require a sufficient statistic in MCMC
The classical theorem uses sufficient statistics. In MCMC, the term is used more loosely: any analytical integration over part of the state space that reduces variance is called Rao-Blackwellization. The variance reduction follows from the law of total variance regardless of sufficiency.
Rao-Blackwellization is not always worth the effort
If the conditional expectation is expensive to compute (e.g., requires summing over exponentially many states), the computational cost may outweigh the variance reduction. The technique is most valuable when the conditional expectation has a closed form.
Summary
- Conditioning on more information never increases variance (law of total variance)
- The Rao-Blackwell theorem turns this into a recipe: condition on sufficient statistics
- In MCMC: analytically integrate out part of the state for free variance reduction
- Combined with Lehmann-Scheffe, Rao-Blackwell leads to UMVUE
- Only works when the conditional expectation is tractable
Exercises
Problem
Let . The estimator is unbiased for . Rao-Blackwellize it using the sufficient statistic and show the result is .
Problem
In a Gibbs sampler for a hierarchical model with latent variables and parameters , you want to estimate . Explain when Rao-Blackwellizing over (using ) would give large variance reduction and when it would give negligible improvement.
References
Canonical:
- Lehmann & Casella, Theory of Point Estimation (1998), Chapter 1.8
- Casella & Robert, Monte Carlo Statistical Methods (2004), Chapter 4
Current:
-
Robert & Casella, Introducing Monte Carlo Methods with R (2010), Chapter 5
-
Gelman et al., Bayesian Data Analysis (2013), Chapters 10-12
-
Brooks et al., Handbook of MCMC (2011), Chapters 1-5
Next Topics
Natural extensions from Rao-Blackwellization:
- Variance reduction techniques: control variates, antithetic variables
- Sufficient statistics and exponential families: the foundation for analytical conditional expectations
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Sufficient Statistics and Exponential FamiliesLayer 0B
- Maximum Likelihood EstimationLayer 0B
- Common Probability DistributionsLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Differentiation in RnLayer 0A
- Importance SamplingLayer 2