Sampling MCMC
Perfect Sampling
Coupling from the past (Propp-Wilson): run Markov chains backward in time until all starting states coalesce. The result is an exact sample from the stationary distribution with zero burn-in bias.
Prerequisites
Why This Matters
Classical MCMC methods like Metropolis-Hastings and Gibbs sampling have a fundamental problem: you never know if you have run the chain long enough. The samples are only approximately from the target distribution, and the approximation error from insufficient burn-in is unknown and unknowable. Every diagnostic (Gelman-Rubin, effective sample size, trace plots) is a heuristic that can fail silently.
Perfect sampling solves this completely. The Propp-Wilson coupling from the past (CFTP) algorithm produces an exact sample from the stationary distribution. Not approximately correct. Not asymptotically correct. Exactly correct, in finite time, with probability 1.
Mental Model
Imagine starting a Markov chain from every possible initial state simultaneously, all driven by the same random numbers. At each step, some chains may land on the same state. Eventually, all chains coalesce to a single state. At that moment, the output is the same regardless of where you started, so it must be a draw from the stationary distribution.
The key insight: you cannot run chains forward from time , but you can work backward. Start at time , then , then , doubling backward until you find a time in the past where all chains have coalesced by time 0.
Formal Setup and Notation
Let be a Markov chain on a finite state space with transition kernel and stationary distribution . Let be a random function such that if , then (i.e., preserves ).
Coupling from the Past (CFTP)
The CFTP algorithm (Propp and Wilson, 1996):
- Fix a sequence of random maps
- For (doubling):
- Compute for every
- If is the same for all , output this common value and stop
- The output is an exact sample from
Main Theorems
CFTP Produces Exact Samples
Statement
If the Markov chain is ergodic on a finite state space and the random maps are i.i.d. and consistent with the transition kernel , then:
- The CFTP algorithm terminates almost surely in finite time
- The output has distribution exactly
Intuition
At the coalescence time, every initial state maps to the same output. Since the chain started from every state simultaneously, the output is invariant to the starting distribution. By stationarity of the random maps, this output must be a draw from . It is crucial that we look backward: starting from different times in the past and running forward to time 0 with the same random numbers.
Proof Sketch
Let be the coalescence time (the smallest such that is constant). For any starting distribution , the output of CFTP is for any (they are all the same). If we start from : since is stationary, applying to a draw from gives a draw from . But the output does not depend on the starting state, so the output has distribution regardless of . Termination follows from ergodicity: the probability of non-coalescence after steps decays geometrically.
Why It Matters
This is one of the most surprising results in computational statistics. It shows that the burn-in problem of MCMC is not inherent: there exist algorithms that produce exact samples from the stationary distribution in finite time. The practical limitation is computational cost, not theoretical correctness.
Failure Mode
Running forward from a fixed time (coupling into the future) does not give exact samples. The coupling time depends on the starting state, introducing bias. CFTP avoids this by using the same random numbers for all starting states and checking coalescence at time 0.
Monotone CFTP
Statement
If the state space has a partial order with maximum element and minimum element , and the random maps preserve this order (), then coalescence of all chains can be checked by tracking only the top chain (starting from ) and bottom chain (starting from ).
If , then for all .
Intuition
If the chain is monotone, the top and bottom chains sandwich all other chains. When the top and bottom merge, everything in between has already merged. This reduces the computational cost from tracking chains to tracking just 2.
Proof Sketch
By monotonicity, implies . If the bounds are equal, all values must be equal.
Why It Matters
Without monotonicity, CFTP requires tracking one chain per state, which is infeasible for large state spaces. Monotone CFTP reduces this to two chains, making perfect sampling practical for models like the Ising model, certain Bayesian networks, and attractive point processes.
Failure Mode
Not all Markov chains admit a monotone coupling. When no natural partial order exists or the transition kernel is not monotone, you must track all states or find alternative coupling strategies, which may be impractical.
Common Confusions
Forward coupling does not give exact samples
If you start all chains at time 0 and run them forward until they coalesce at some random time , the output is not a draw from . The coupling time depends on the starting states, creating a selection bias. CFTP works because it fixes time 0 as the output time and looks backward to find when coalescence occurred.
CFTP is not the same as running MCMC very long
Long MCMC runs produce approximate samples with unknown error. CFTP produces exact samples with a random (but finite) computation time. The distinction is fundamental: no diagnostic can certify that an MCMC sample is exact, but CFTP provides a certificate by construction.
You must reuse the same random numbers
When you extend the look-back window from to , you must keep the same random maps and add new ones . Regenerating all random numbers would invalidate the coupling and destroy exactness.
Summary
- CFTP gives exact samples from the stationary distribution, not approximations
- Run chains backward: from different past times to the fixed present (time 0)
- Coalescence = all starting states produce the same output = exact sample
- Monotone CFTP reduces tracking from chains to 2 chains
- Forward coupling (coupling into the future) does not work
- The same random numbers must be reused when extending the look-back window
Exercises
Problem
Explain why running a Markov chain forward from starting states until coalescence does not produce an exact sample from .
Problem
Consider a two-state Markov chain with states and transition matrix where . Design a monotone random map consistent with and determine the expected coalescence time for CFTP.
References
Canonical:
- Propp & Wilson, "Exact Sampling with Coupled Markov Chains and Applications to Statistical Mechanics" (1996)
- Fill, "An Interruptible Algorithm for Perfect Sampling via Markov Chains" (1998)
Current:
-
Huber, Perfect Simulation (2016)
-
Robert & Casella, Monte Carlo Statistical Methods (2004), Chapter 14
-
Gelman et al., Bayesian Data Analysis (2013), Chapters 10-12
-
Brooks et al., Handbook of MCMC (2011), Chapters 1-5
Next Topics
Natural extensions from perfect sampling:
- MCMC convergence diagnostics: the approximate alternative when perfect sampling is not feasible
- Coupling methods: broader coupling techniques for bounding mixing times
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Metropolis-Hastings AlgorithmLayer 2
- Common Probability DistributionsLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Gibbs SamplingLayer 2