Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Sampling MCMC

Rao-Blackwellization

The Rao-Blackwell theorem: conditioning an estimator on a sufficient statistic reduces variance without increasing bias. In MCMC, this means replacing sample averages with conditional expectations for lower-variance estimates at no extra sampling cost.

CoreTier 2Stable~40 min
0

Why This Matters

You have an unbiased estimator from your MCMC sampler. The Rao-Blackwell theorem says you can often get a strictly better estimator, one with lower variance and the same bias (zero), by conditioning on a sufficient statistic. The improvement is free: no additional samples needed, no tuning parameters, no approximations.

In MCMC, Rao-Blackwellization is one of the most reliable variance reduction techniques. Whenever your sampler explores a joint space (x,z)(x, z) but you only care about a function of xx, you can analytically integrate out zz to get a lower-variance estimate.

Mental Model

Imagine estimating the mean of a die roll. You could roll and report the result (high variance), or you could use the fact that conditional on knowing whether the roll is odd or even, you know the conditional expectation exactly. Rao-Blackwellization replaces a noisy estimator with its conditional expectation given some information you already have. The law of total variance guarantees this always reduces variance.

Formal Setup and Notation

Let θ\theta be a parameter, XX data, T(X)T(X) a sufficient statistic for θ\theta, and δ(X)\delta(X) an unbiased estimator of g(θ)g(\theta).

Main Theorems

Theorem

Rao-Blackwell Theorem

Statement

Let δ(X)\delta(X) be an unbiased estimator of g(θ)g(\theta) and let T(X)T(X) be a sufficient statistic for θ\theta. Define the Rao-Blackwellized estimator:

δ(X)=E[δ(X)T(X)]\delta^*(X) = \mathbb{E}[\delta(X) \mid T(X)]

Then:

  1. δ\delta^* is unbiased: E[δ]=g(θ)\mathbb{E}[\delta^*] = g(\theta)
  2. δ\delta^* has no greater variance: Var(δ)Var(δ)\text{Var}(\delta^*) \leq \text{Var}(\delta)
  3. δ\delta^* depends on XX only through T(X)T(X)

The inequality is strict unless δ\delta is already a function of TT.

Intuition

Conditioning on a sufficient statistic averages out the noise in δ\delta that is unrelated to the parameter. The sufficient statistic captures all the information about θ\theta, so conditioning on it preserves the mean while removing extraneous randomness.

Proof Sketch

Unbiasedness: E[δ]=E[E[δT]]=E[δ]=g(θ)\mathbb{E}[\delta^*] = \mathbb{E}[\mathbb{E}[\delta \mid T]] = \mathbb{E}[\delta] = g(\theta) by the tower property. Variance reduction: by the law of total variance, Var(δ)=Var(E[δT])+E[Var(δT)]=Var(δ)+E[Var(δT)]\text{Var}(\delta) = \text{Var}(\mathbb{E}[\delta \mid T]) + \mathbb{E}[\text{Var}(\delta \mid T)] = \text{Var}(\delta^*) + \mathbb{E}[\text{Var}(\delta \mid T)]. Since E[Var(δT)]0\mathbb{E}[\text{Var}(\delta \mid T)] \geq 0, we get Var(δ)Var(δ)\text{Var}(\delta) \geq \text{Var}(\delta^*).

Why It Matters

This theorem provides a systematic recipe for improving estimators. Start with any unbiased estimator, find a sufficient statistic, and condition. Combined with the Lehmann-Scheffe theorem, this path leads to uniformly minimum variance unbiased estimators (UMVUE) when a complete sufficient statistic exists.

Failure Mode

You need to be able to compute the conditional expectation E[δT]\mathbb{E}[\delta \mid T] analytically. If this is intractable, you cannot Rao-Blackwellize. In complex models, the conditional expectation may itself require an approximation, partially defeating the purpose.

Rao-Blackwellization in MCMC

Definition

Rao-Blackwellization for MCMC

Suppose you run an MCMC sampler on a joint space (X,Z)(X, Z) and want to estimate E[h(X)]\mathbb{E}[h(X)]. The naive estimator averages h(X(t))h(X^{(t)}) over MCMC samples. The Rao-Blackwellized estimator instead uses:

μ^RB=1Tt=1TE[h(X)Z(t)]\hat\mu_{\text{RB}} = \frac{1}{T} \sum_{t=1}^{T} \mathbb{E}[h(X) \mid Z^{(t)}]

If the conditional expectation E[h(X)Z]\mathbb{E}[h(X) \mid Z] can be computed in closed form, this estimator has lower variance than the naive one.

In Gibbs sampling, you often sample (X,Z)(X, Z) by alternating between the full conditionals. If you can compute E[h(X)Z=z]\mathbb{E}[h(X) \mid Z = z] analytically, use it instead of the raw XX samples.

Canonical Examples

Example

Rao-Blackwellization in a normal-normal model

Suppose ZN(0,1)Z \sim N(0, 1) and XZN(Z,1)X \mid Z \sim N(Z, 1), and you want E[X]\mathbb{E}[X]. A Gibbs sampler produces pairs (X(t),Z(t))(X^{(t)}, Z^{(t)}). The naive estimate is 1TtX(t)\frac{1}{T}\sum_t X^{(t)}. The Rao-Blackwellized estimate is 1TtE[XZ(t)]=1TtZ(t)\frac{1}{T}\sum_t \mathbb{E}[X \mid Z^{(t)}] = \frac{1}{T}\sum_t Z^{(t)}. The second estimator has variance from only the ZZ chain, removing the conditional variance of XZX \mid Z.

Example

Integrating out discrete latent variables

In a mixture model with latent cluster assignments ZiZ_i, you want E[f(θ)data]\mathbb{E}[f(\theta) \mid \text{data}]. Instead of using the sampled θ(t)\theta^{(t)} directly, compute E[f(θ)Z(t),data]\mathbb{E}[f(\theta) \mid Z^{(t)}, \text{data}] by integrating over θ\theta given the cluster assignments. If θ\theta has a conjugate posterior given ZZ, this integral is analytic.

Common Confusions

Watch Out

Rao-Blackwellization does not require a sufficient statistic in MCMC

The classical theorem uses sufficient statistics. In MCMC, the term is used more loosely: any analytical integration over part of the state space that reduces variance is called Rao-Blackwellization. The variance reduction follows from the law of total variance regardless of sufficiency.

Watch Out

Rao-Blackwellization is not always worth the effort

If the conditional expectation E[h(X)Z]\mathbb{E}[h(X) \mid Z] is expensive to compute (e.g., requires summing over exponentially many states), the computational cost may outweigh the variance reduction. The technique is most valuable when the conditional expectation has a closed form.

Summary

  • Conditioning on more information never increases variance (law of total variance)
  • The Rao-Blackwell theorem turns this into a recipe: condition on sufficient statistics
  • In MCMC: analytically integrate out part of the state for free variance reduction
  • Combined with Lehmann-Scheffe, Rao-Blackwell leads to UMVUE
  • Only works when the conditional expectation is tractable

Exercises

ExerciseCore

Problem

Let X1,,XnBernoulli(p)X_1, \ldots, X_n \sim \text{Bernoulli}(p). The estimator δ=X1\delta = X_1 is unbiased for pp. Rao-Blackwellize it using the sufficient statistic T=i=1nXiT = \sum_{i=1}^n X_i and show the result is Xˉ\bar{X}.

ExerciseAdvanced

Problem

In a Gibbs sampler for a hierarchical model with latent variables ZZ and parameters θ\theta, you want to estimate E[θdata]\mathbb{E}[\theta \mid \text{data}]. Explain when Rao-Blackwellizing over ZZ (using E[θZ(t),data]\mathbb{E}[\theta \mid Z^{(t)}, \text{data}]) would give large variance reduction and when it would give negligible improvement.

References

Canonical:

  • Lehmann & Casella, Theory of Point Estimation (1998), Chapter 1.8
  • Casella & Robert, Monte Carlo Statistical Methods (2004), Chapter 4

Current:

  • Robert & Casella, Introducing Monte Carlo Methods with R (2010), Chapter 5

  • Gelman et al., Bayesian Data Analysis (2013), Chapters 10-12

  • Brooks et al., Handbook of MCMC (2011), Chapters 1-5

Next Topics

Natural extensions from Rao-Blackwellization:

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.