Perfect Sampling

Sneiderman, Robby

Sampling MCMC

Perfect Sampling

Coupling from the past (Propp-Wilson): run Markov chains backward in time until all starting states coalesce. The result is an exact sample from the stationary distribution with no burn-in approximation, but often at a serious computational price.

AdvancedTier 3StableSupporting~45 min

Prerequisites

Metropolis Hastings Gibbs Sampling

Prereq Map

Learning position

Read this page in the graph.

sampling-mcmc | layer 3 | tier 3. This page has 2 direct prerequisites and 2 published dependents.

Open Atlas Prerequisites Leads to

What next

Burn-in and Convergence Diagnostics

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Classical MCMC methods like Metropolis-Hastings and Gibbs sampling always leave a practical question: have you run the chain long enough? The samples are only approximately from the target distribution, and the approximation error from insufficient burn-in must be assessed indirectly through diagnostics such as $\hat{R}$ , ESS, and trace behavior.

Perfect sampling solves this completely. The Propp-Wilson coupling from the past (CFTP) algorithm produces an exact sample from the stationary distribution. Not approximately correct. Not asymptotically correct. Exactly correct, in finite time, with probability 1.

Mental Model

Imagine starting a Markov chain from every possible initial state simultaneously, all driven by the same random numbers. At each step, some chains may land on the same state. Eventually, all chains coalesce to a single state. At that moment, the output is the same regardless of where you started, so it must be a draw from the stationary distribution.

The key insight: you cannot run chains forward from time $-\infty$ , but you can work backward. Start at time $-1$ , then $-2$ , then $-4$ , doubling backward until you find a time in the past where all chains have coalesced by time 0.

Formal Setup and Notation

Let $\{X_t\}$ be a Markov chain on a finite state space $\mathcal{S}$ with transition kernel $P$ and stationary distribution $\pi$ . Let $\phi_t: \mathcal{S} \times \Omega \to \mathcal{S}$ be a random function such that if $X_t \sim \pi$ , then $\phi_t(X_t) \sim \pi$ (i.e., $\phi_t$ preserves $\pi$ ).

Definition

Coupling from the Past (CFTP)

The CFTP algorithm (Propp and Wilson, 1996):

Fix a sequence of random maps $\phi_0, \phi_{-1}, \phi_{-2}, \ldots$
For $N = 1, 2, 4, 8, \ldots$ $N = 1, 2, 4, 8, \dots$ (doubling):
- Compute $\Phi_{-N}^0(x) = \phi_0 \circ \phi_{-1} \circ \cdots \circ \phi_{-N+1}(x)$ for every $x \in \mathcal{S}$
- If $\Phi_{-N}^0(x)$ is the same for all $x \in \mathcal{S}$ , output this common value and stop
The output is an exact sample from $\pi$

When Perfect Sampling Is Actually Worth It

Perfect sampling is conceptually clean, but it is not the default answer to every MCMC problem.

If the state space is huge and there is no monotone structure, tracking all starting states is usually hopeless.
If the chain admits a natural partial order, monotone CFTP can be practical and often gives the cleanest exact-sampling story for attractive lattice systems and some queueing models.
If exactness matters more than wall-clock time, perfect sampling is one of the rare ways to eliminate burn-in bias by construction rather than by diagnosis.

This is why perfect sampling belongs beside, not above, ordinary MCMC. It is a different point in the tradeoff space: exactness in exchange for potentially expensive coupling work.

Main Theorems

Theorem

CFTP Produces Exact Samples

Statement

If the Markov chain is ergodic on a finite state space and the random maps $\phi_t$ are i.i.d. and consistent with the transition kernel $P$ , then:

The CFTP algorithm terminates almost surely in finite time
The output has distribution exactly $\pi$

Intuition

At the coalescence time, every initial state maps to the same output. Since the chain started from every state simultaneously, the output is invariant to the starting distribution. By stationarity of the random maps, this output must be a draw from $\pi$ . The construction must run backward: start from different times in the past, then run forward to time 0 reusing the same random numbers. Coupling forward into the future breaks the argument.

Proof Sketch

Let $T^*$ be the coalescence time (the smallest $N$ such that $\Phi_{-N}^0$ is constant). For any starting distribution $\mu$ , the output of CFTP is $\Phi_{-T^*}^0(x)$ for any $x$ (they are all the same). If we start from $\mu = \pi$ : since $\pi$ is stationary, applying $\Phi_{-T^*}^0$ to a draw from $\pi$ gives a draw from $\pi$ . But the output does not depend on the starting state, so the output has distribution $\pi$ regardless of $\mu$ . Termination follows from ergodicity: the probability of non-coalescence after $N$ steps decays geometrically.

Why It Matters

This is one of the most surprising results in computational statistics. It shows that the burn-in problem of MCMC is not inherent: there exist algorithms that produce exact samples from the stationary distribution in finite time. The practical limitation is computational cost, not theoretical correctness.

Failure Mode

Running forward from a fixed time (coupling into the future) does not give exact samples. The coupling time depends on the starting state, introducing bias. CFTP avoids this by using the same random numbers for all starting states and checking coalescence at time 0.

report a correction →

Proposition

Monotone CFTP

Statement

If the state space $\mathcal{S}$ has a partial order with maximum element $\hat{1}$ and minimum element $\hat{0}$ , and the random maps $\phi_t$ preserve this order ( $x \leq y \implies \phi_t(x) \leq \phi_t(y)$ ), then coalescence of all chains can be checked by tracking only the top chain (starting from $\hat{1}$ ) and bottom chain (starting from $\hat{0}$ ).

If $\Phi_{-N}^0(\hat{0}) = \Phi_{-N}^0(\hat{1})$ , then $\Phi_{-N}^0(x) = \Phi_{-N}^0(\hat{0})$ for all $x$ .

Intuition

If the chain is monotone, the top and bottom chains sandwich all other chains. When the top and bottom merge, everything in between has already merged. This reduces the computational cost from tracking $|\mathcal{S}|$ chains to tracking just 2.

Proof Sketch

By monotonicity, $\hat{0} \leq x \leq \hat{1}$ implies $\Phi_{-N}^0(\hat{0}) \leq \Phi_{-N}^0(x) \leq \Phi_{-N}^0(\hat{1})$ . If the bounds are equal, all values must be equal.

Why It Matters

Without monotonicity, CFTP requires tracking one chain per state, which is infeasible for large state spaces. Monotone CFTP reduces this to two chains, making perfect sampling practical for models like the Ising model, certain Bayesian networks, and attractive point processes.

Failure Mode

Not all Markov chains admit a monotone coupling. When no natural partial order exists or the transition kernel is not monotone, you must track all states or find alternative coupling strategies, which may be impractical.

report a correction →

Perfect Sampling and MRFs

One reason perfect sampling keeps reappearing in statistics is that some interesting models have exactly the structure monotone CFTP needs. Attractive Ising-type Markov random fields are the canonical example: the partial order is coordinate-wise, the extremal configurations are the all-minus and all-plus states, and coalescence of those two bounding chains certifies coalescence of everything in between.

That is the ideal use case: the state space is still large, but the order structure compresses the coupling test from "track all states" to "track the extreme states." Without that monotone structure, perfect sampling often becomes more of a theoretical benchmark than a practical workflow.

Common Confusions

Watch Out

Forward coupling does not give exact samples

If you start all chains at time 0 and run them forward until they coalesce at some random time $T$ , the output is not a draw from $\pi$ . The coupling time $T$ depends on the starting states, creating a selection bias. CFTP works because it fixes time 0 as the output time and looks backward to find when coalescence occurred.

Watch Out

CFTP is not the same as running MCMC very long

Long MCMC runs produce approximate samples with unknown error. CFTP produces exact samples with a random (but finite) computation time. The distinction is fundamental: no diagnostic can certify that an MCMC sample is exact, but CFTP provides a certificate by construction.

Watch Out

You must reuse the same random numbers

When you extend the look-back window from $-N$ to $-2N$ , you must keep the same random maps $\phi_0, \phi_{-1}, \ldots, \phi_{-N+1}$ and add new ones $\phi_{-N}, \ldots, \phi_{-2N+1}$ . Regenerating all random numbers would invalidate the coupling and destroy exactness.

Watch Out

Exact does not mean cheap

Perfect sampling removes burn-in error, not computational cost. In difficult models the coupling time can be enormous, and tracking the needed random maps can dominate the total runtime. The output is exact; the bill can still be high.

Summary

CFTP gives exact samples from the stationary distribution, not approximations
Run chains backward: from different past times to the fixed present (time 0)
Coalescence = all starting states produce the same output = exact sample
Monotone CFTP reduces tracking from $|\mathcal{S}|$ chains to 2 chains
Forward coupling (coupling into the future) does not work
The same random numbers must be reused when extending the look-back window
Perfect sampling is most useful when monotone structure makes the coupling test cheap enough to be practical

Exercises

ExerciseCore

Problem

Explain why running a Markov chain forward from $|\mathcal{S}|$ starting states until coalescence does not produce an exact sample from $\pi$ .

ExerciseAdvanced

Problem

Consider a two-state Markov chain with states $\{0, 1\}$ and transition matrix $P = \begin{pmatrix} 1-a & a \\ b & 1-b \end{pmatrix}$ where $a, b \in (0,1)$ and $a + b \leq 1$ . Design a monotone random map $\phi$ consistent with $P$ , and identify a condition on the shared uniform random variable that forces the top and bottom chains to coalesce in one step.

References

Canonical:

Propp & Wilson, "Exact Sampling with Coupled Markov Chains and Applications to Statistical Mechanics" (1996)
Fill, "An Interruptible Algorithm for Perfect Sampling via Markov Chains" (1998)

Current:

Huber, Perfect Simulation (2016)
Robert & Casella, Monte Carlo Statistical Methods (2004), Chapter 14
Gelman et al., Bayesian Data Analysis (2013), Chapters 10-12
Brooks et al., Handbook of MCMC (2011), Chapters 1-5
Kendall, "Perfect simulation for the area-interaction point process" (1998), for monotone-coupling practice
Foss and Tweedie, "Perfect simulation and backward coupling" (1998), Stochastic Models

Next Topics

Natural extensions from perfect sampling:

Burn-in and convergence diagnostics: the approximate alternative when perfect sampling is not feasible
MCMC for Markov random fields: the attractive-lattice setting where monotone CFTP is most intuitive
Coupling methods: broader coupling techniques for bounding mixing times

Last reviewed: April 25, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

2

Gibbs Samplinglayer 2 · tier 1
Metropolis-Hastings Algorithmlayer 2 · tier 1

Derived topics

2

Burn-in and Convergence Diagnosticslayer 2 · tier 2
MCMC for Markov Random Fieldslayer 3 · tier 3

Graph-backed continuations

Burn-in and Convergence Diagnostics MCMC for Markov Random Fields