Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Sampling MCMC

Perfect Sampling

Coupling from the past (Propp-Wilson): run Markov chains backward in time until all starting states coalesce. The result is an exact sample from the stationary distribution with zero burn-in bias.

AdvancedTier 3Stable~45 min
0

Why This Matters

Classical MCMC methods like Metropolis-Hastings and Gibbs sampling have a fundamental problem: you never know if you have run the chain long enough. The samples are only approximately from the target distribution, and the approximation error from insufficient burn-in is unknown and unknowable. Every diagnostic (Gelman-Rubin, effective sample size, trace plots) is a heuristic that can fail silently.

Perfect sampling solves this completely. The Propp-Wilson coupling from the past (CFTP) algorithm produces an exact sample from the stationary distribution. Not approximately correct. Not asymptotically correct. Exactly correct, in finite time, with probability 1.

Mental Model

Imagine starting a Markov chain from every possible initial state simultaneously, all driven by the same random numbers. At each step, some chains may land on the same state. Eventually, all chains coalesce to a single state. At that moment, the output is the same regardless of where you started, so it must be a draw from the stationary distribution.

The key insight: you cannot run chains forward from time -\infty, but you can work backward. Start at time 1-1, then 2-2, then 4-4, doubling backward until you find a time in the past where all chains have coalesced by time 0.

Formal Setup and Notation

Let {Xt}\{X_t\} be a Markov chain on a finite state space S\mathcal{S} with transition kernel PP and stationary distribution π\pi. Let ϕt:S×ΩS\phi_t: \mathcal{S} \times \Omega \to \mathcal{S} be a random function such that if XtπX_t \sim \pi, then ϕt(Xt)π\phi_t(X_t) \sim \pi (i.e., ϕt\phi_t preserves π\pi).

Definition

Coupling from the Past (CFTP)

The CFTP algorithm (Propp and Wilson, 1996):

  1. Fix a sequence of random maps ϕ0,ϕ1,ϕ2,\phi_0, \phi_{-1}, \phi_{-2}, \ldots
  2. For N=1,2,4,8,N = 1, 2, 4, 8, \ldots (doubling):
    • Compute ΦN0(x)=ϕ0ϕ1ϕN+1(x)\Phi_{-N}^0(x) = \phi_0 \circ \phi_{-1} \circ \cdots \circ \phi_{-N+1}(x) for every xSx \in \mathcal{S}
    • If ΦN0(x)\Phi_{-N}^0(x) is the same for all xSx \in \mathcal{S}, output this common value and stop
  3. The output is an exact sample from π\pi

Main Theorems

Theorem

CFTP Produces Exact Samples

Statement

If the Markov chain is ergodic on a finite state space and the random maps ϕt\phi_t are i.i.d. and consistent with the transition kernel PP, then:

  1. The CFTP algorithm terminates almost surely in finite time
  2. The output has distribution exactly π\pi

Intuition

At the coalescence time, every initial state maps to the same output. Since the chain started from every state simultaneously, the output is invariant to the starting distribution. By stationarity of the random maps, this output must be a draw from π\pi. It is crucial that we look backward: starting from different times in the past and running forward to time 0 with the same random numbers.

Proof Sketch

Let TT^* be the coalescence time (the smallest NN such that ΦN0\Phi_{-N}^0 is constant). For any starting distribution μ\mu, the output of CFTP is ΦT0(x)\Phi_{-T^*}^0(x) for any xx (they are all the same). If we start from μ=π\mu = \pi: since π\pi is stationary, applying ΦT0\Phi_{-T^*}^0 to a draw from π\pi gives a draw from π\pi. But the output does not depend on the starting state, so the output has distribution π\pi regardless of μ\mu. Termination follows from ergodicity: the probability of non-coalescence after NN steps decays geometrically.

Why It Matters

This is one of the most surprising results in computational statistics. It shows that the burn-in problem of MCMC is not inherent: there exist algorithms that produce exact samples from the stationary distribution in finite time. The practical limitation is computational cost, not theoretical correctness.

Failure Mode

Running forward from a fixed time (coupling into the future) does not give exact samples. The coupling time depends on the starting state, introducing bias. CFTP avoids this by using the same random numbers for all starting states and checking coalescence at time 0.

Proposition

Monotone CFTP

Statement

If the state space S\mathcal{S} has a partial order with maximum element 1^\hat{1} and minimum element 0^\hat{0}, and the random maps ϕt\phi_t preserve this order (xy    ϕt(x)ϕt(y)x \leq y \implies \phi_t(x) \leq \phi_t(y)), then coalescence of all chains can be checked by tracking only the top chain (starting from 1^\hat{1}) and bottom chain (starting from 0^\hat{0}).

If ΦN0(0^)=ΦN0(1^)\Phi_{-N}^0(\hat{0}) = \Phi_{-N}^0(\hat{1}), then ΦN0(x)=ΦN0(0^)\Phi_{-N}^0(x) = \Phi_{-N}^0(\hat{0}) for all xx.

Intuition

If the chain is monotone, the top and bottom chains sandwich all other chains. When the top and bottom merge, everything in between has already merged. This reduces the computational cost from tracking S|\mathcal{S}| chains to tracking just 2.

Proof Sketch

By monotonicity, 0^x1^\hat{0} \leq x \leq \hat{1} implies ΦN0(0^)ΦN0(x)ΦN0(1^)\Phi_{-N}^0(\hat{0}) \leq \Phi_{-N}^0(x) \leq \Phi_{-N}^0(\hat{1}). If the bounds are equal, all values must be equal.

Why It Matters

Without monotonicity, CFTP requires tracking one chain per state, which is infeasible for large state spaces. Monotone CFTP reduces this to two chains, making perfect sampling practical for models like the Ising model, certain Bayesian networks, and attractive point processes.

Failure Mode

Not all Markov chains admit a monotone coupling. When no natural partial order exists or the transition kernel is not monotone, you must track all states or find alternative coupling strategies, which may be impractical.

Common Confusions

Watch Out

Forward coupling does not give exact samples

If you start all chains at time 0 and run them forward until they coalesce at some random time TT, the output is not a draw from π\pi. The coupling time TT depends on the starting states, creating a selection bias. CFTP works because it fixes time 0 as the output time and looks backward to find when coalescence occurred.

Watch Out

CFTP is not the same as running MCMC very long

Long MCMC runs produce approximate samples with unknown error. CFTP produces exact samples with a random (but finite) computation time. The distinction is fundamental: no diagnostic can certify that an MCMC sample is exact, but CFTP provides a certificate by construction.

Watch Out

You must reuse the same random numbers

When you extend the look-back window from N-N to 2N-2N, you must keep the same random maps ϕ0,ϕ1,,ϕN+1\phi_0, \phi_{-1}, \ldots, \phi_{-N+1} and add new ones ϕN,,ϕ2N+1\phi_{-N}, \ldots, \phi_{-2N+1}. Regenerating all random numbers would invalidate the coupling and destroy exactness.

Summary

  • CFTP gives exact samples from the stationary distribution, not approximations
  • Run chains backward: from different past times to the fixed present (time 0)
  • Coalescence = all starting states produce the same output = exact sample
  • Monotone CFTP reduces tracking from S|\mathcal{S}| chains to 2 chains
  • Forward coupling (coupling into the future) does not work
  • The same random numbers must be reused when extending the look-back window

Exercises

ExerciseCore

Problem

Explain why running a Markov chain forward from S|\mathcal{S}| starting states until coalescence does not produce an exact sample from π\pi.

ExerciseAdvanced

Problem

Consider a two-state Markov chain with states {0,1}\{0, 1\} and transition matrix P=(1aab1b)P = \begin{pmatrix} 1-a & a \\ b & 1-b \end{pmatrix} where a,b(0,1)a, b \in (0,1). Design a monotone random map ϕ\phi consistent with PP and determine the expected coalescence time for CFTP.

References

Canonical:

  • Propp & Wilson, "Exact Sampling with Coupled Markov Chains and Applications to Statistical Mechanics" (1996)
  • Fill, "An Interruptible Algorithm for Perfect Sampling via Markov Chains" (1998)

Current:

  • Huber, Perfect Simulation (2016)

  • Robert & Casella, Monte Carlo Statistical Methods (2004), Chapter 14

  • Gelman et al., Bayesian Data Analysis (2013), Chapters 10-12

  • Brooks et al., Handbook of MCMC (2011), Chapters 1-5

Next Topics

Natural extensions from perfect sampling:

  • MCMC convergence diagnostics: the approximate alternative when perfect sampling is not feasible
  • Coupling methods: broader coupling techniques for bounding mixing times

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.