Sampling MCMC
Variance Reduction Techniques
Get the same accuracy with fewer samples by exploiting correlation, known quantities, and stratification. Antithetic variates, control variates, stratification, and Rao-Blackwellization.
Prerequisites
Why This Matters
Monte Carlo estimation has a fundamental limitation: the standard error of the mean decreases as . To halve the error, you need four times as many samples. Variance reduction techniques break this bottleneck by using structure in the problem to get more information per sample. In Bayesian inference, reinforcement learning, and simulation, these techniques can reduce computation by orders of magnitude.
Mental Model
Imagine estimating the average height of people in a city by random sampling. Naive: pick people at random. Smarter: sample equal numbers from each neighborhood (stratification). Even smarter: if you know the average income of each neighborhood and income correlates with height, use that information to adjust your estimate (control variates). The idea is always the same: use what you already know to reduce uncertainty in what you do not.
Formal Setup and Notation
We want to estimate where . The naive Monte Carlo estimator is:
with variance where . Variance reduction constructs an alternative estimator with while keeping .
Antithetic Variates
Generate pairs that are negatively correlated but each marginally distributed as . The estimator:
has variance . When , this variance is less than .
For example, if , set . Then is also but negatively correlated with . If is monotone, and are negatively correlated, so the variance drops.
Control Variates
Let be a function whose expectation is known. The control variate estimator is:
for some coefficient . This is unbiased for any because . Its variance is:
Stratified Sampling
Partition the sample space into disjoint strata with . Sample points from each stratum (the conditional distribution ) and combine:
This is always at least as good as naive Monte Carlo, and strictly better whenever the stratum means differ.
Rao-Blackwellization
If and you can compute analytically, then replace with :
By the law of total variance, . Conditioning out part of the randomness always reduces variance.
Main Theorems
Optimal Control Variate Coefficient
Statement
The variance of the control variate estimator is minimized by:
The minimum variance is:
where is the correlation between and .
Intuition
The optimal is the regression coefficient of on . The variance reduction factor is : if and are highly correlated (), almost all variance is eliminated. The control variate is like subtracting the "explained" part of .
Proof Sketch
. This is a quadratic in with minimum at . Substitute back to get .
Why It Matters
This tells you exactly how powerful a control variate will be: it depends entirely on the correlation . With , you reduce variance by , equivalent to using more samples. In practice, you estimate from data, which adds small overhead.
Failure Mode
If , the control variate does nothing. Also, estimating from the same samples introduces a small bias in finite samples (the product of two estimated quantities). With large , this bias is negligible.
Rao-Blackwell Theorem (Variance Reduction)
Statement
Let . Then:
with equality if and only if almost surely (i.e., does not actually depend on ).
Intuition
By conditioning on , you analytically average out the randomness in . This removes the component of variance due to , leaving only the variance due to . It is like having infinitely many samples of for each value of .
Proof Sketch
By the law of total variance: . Since , we get .
Why It Matters
Rao-Blackwellization is the most principled variance reduction technique: it is guaranteed to help and never hurts. In Gibbs sampling, if you can compute conditional expectations for some variables analytically, always do so. It is free variance reduction.
Failure Mode
The technique requires being able to compute analytically, which is often intractable. It also requires choosing the partition wisely. If contributes little variance, the reduction is small.
Canonical Examples
Control variate for option pricing
Estimating for a call option. Use (the terminal stock price) as a control variate. Under risk-neutral pricing, is known. Since the payoff is highly correlated with , this dramatically reduces variance.
Antithetic variates for integral estimation
Estimate . Generate and use pairs . Since is increasing, and are negatively correlated. The estimator has lower variance than alone.
Common Confusions
Variance reduction does not change the rate
All these techniques reduce the constant in but keep the rate. You still need samples to halve the error. The improvement is in the constant , which can be enormous in practice but does not change the asymptotic rate.
Control variates require known expectations
The control variate must have a known mean . If you have to estimate , it is no longer a control variate. It becomes an importance sampling or regression adjustment problem.
Summary
- Antithetic variates: use negative correlation between sample pairs
- Control variates: subtract a known-mean quantity correlated with the target; optimal coefficient is
- Stratification: partition the space and sample within strata; always helps
- Rao-Blackwellization: condition out part of the randomness analytically; guaranteed to reduce variance by the law of total variance
- Variance reduction changes the constant, not the rate
Exercises
Problem
You want to estimate where . Explain how to use antithetic variates and why it reduces variance.
Problem
Derive the optimal control variate coefficient and show that the variance reduction is where is the correlation between and .
References
Canonical:
- Robert & Casella, Monte Carlo Statistical Methods (2004), Chapter 4
- Ross, Simulation (2012), Chapter 9
Current:
-
Owen, Monte Carlo Theory, Methods, and Examples (2013), Chapters 8-9
-
Gelman et al., Bayesian Data Analysis (2013), Chapters 10-12
-
Brooks et al., Handbook of MCMC (2011), Chapters 1-5
Next Topics
The natural next steps from variance reduction:
- Burn-in and convergence diagnostics: knowing when MCMC samples are usable
- Hamiltonian Monte Carlo: a sampler that naturally has low variance
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Importance SamplingLayer 2
- Common Probability DistributionsLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A