Sampling MCMC
Coupling Arguments and Mixing Time
Coupling constructs two Markov chains on the same probability space so they eventually meet, bounding total variation distance and mixing time. Spectral gap and coupling inequality are the main tools for proving how fast MCMC converges to stationarity.
Prerequisites
Why This Matters
When you run MCMC, the central question is: how many steps before the chain is close to the stationary distribution? If you stop too early, your samples are biased by the initial state. If you run too long, you waste computation. Mixing time gives a rigorous answer.
Coupling arguments are the sharpest general tool for bounding mixing time. They convert the abstract question "how close is the chain's distribution to stationarity?" into a concrete one: "how long until two coupled chains meet?"
Total Variation Distance
Total Variation Distance
The total variation distance between two probability distributions and on a countable space is:
It ranges from 0 (identical distributions) to 1 (mutually singular distributions).
Mixing Time
The mixing time of an ergodic Markov chain with stationary distribution is:
where is the distribution after steps starting from . By convention, . The choice of is arbitrary; for any , .
Coupling
Coupling
A coupling of two distributions and on is a joint distribution on such that and marginally. Any two distributions can be coupled in many ways; the art is choosing a coupling that makes and meet quickly.
For Markov chains: a coupling of two copies and of a chain with transition kernel is a joint process on the same probability space such that each marginal is a valid copy of the chain. The coupling time is .
A faithful coupling ensures that once , the chains stay together: for all .
Coupling Inequality
Statement
For any coupling of distributions and :
Moreover, there exists an optimal coupling that achieves equality.
Intuition
If and are coupled so they agree with high probability, then and must be close. The coupling inequality makes this precise: the probability of disagreement upper bounds the total variation distance. You do not need to compute the distance directly; you just need to show the coupled chains meet quickly.
Proof Sketch
For any event : . Taking the max over gives . The optimal coupling (which can be constructed explicitly) places mass on the diagonal and distributes the remainder optimally.
Why It Matters
This is the fundamental bridge between coupling and mixing. To bound the mixing time, start one chain from any state and the other from stationarity . Construct a coupling where they meet by time . Then . Bounding the coupling time bounds the mixing time.
Failure Mode
The coupling inequality gives an upper bound. A bad coupling (one where the chains rarely meet) gives a loose bound, not a proof of slow mixing. To prove lower bounds on mixing time, you need different techniques (e.g., bottleneck ratio, conductance). The inequality also requires constructing an explicit coupling, which can be non-trivial for complex chains.
Spectral Gap and Mixing
Spectral Gap
For a reversible Markov chain with transition matrix and stationary distribution , the spectral gap is:
where is the second largest eigenvalue of (in absolute value, the second largest is , but for lazy chains so ). A larger spectral gap means faster mixing.
Spectral Gap and Mixing Time
Statement
For a reversible, irreducible Markov chain on a finite state space with spectral gap and minimum stationary probability :
Intuition
The spectral gap controls the rate of exponential convergence. The upper bound says mixing happens in roughly steps. The factor accounts for the worst-case starting state: if puts very small mass on some states, starting there requires more time to "diffuse" into the bulk of the distribution.
Proof Sketch
For the upper bound: after steps, by the spectral decomposition of . Setting this to and solving gives . The lower bound uses the variational characterization of and the fact that certain test functions must converge at rate .
Why It Matters
The spectral gap gives a quantitative connection between the chain's linear algebra (eigenvalues) and its statistical behavior (mixing). For random walks on graphs, the spectral gap of the graph Laplacian determines mixing: well-connected graphs mix fast, poorly connected ones mix slowly.
Failure Mode
Computing the spectral gap exactly requires knowing all eigenvalues, which is feasible only for small state spaces or chains with special structure (e.g., random walks on known graphs). For continuous state spaces, the spectral gap is replaced by the Poincare constant, and bounding it requires functional inequalities (log-Sobolev, isoperimetry) that can be difficult to establish.
Common Confusions
Coupling time is not mixing time
The coupling time is a random variable for a specific coupling construction. The mixing time is a deterministic quantity. The coupling inequality says . A good coupling gives a tight bound, but a bad coupling can overestimate the mixing time by an arbitrary amount.
Spectral gap bounds are for reversible chains
The spectral gap analysis assumes reversibility (detailed balance). Many practical MCMC algorithms (Metropolis-Hastings, Gibbs sampling) produce reversible chains. Non-reversible chains (e.g., Hamiltonian Monte Carlo, lifted Markov chains) can mix faster than any reversible chain on the same state space, and their analysis requires different tools.
Canonical Examples
Coupling for lazy random walk on the hypercube
Consider a lazy random walk on : at each step, pick a coordinate uniformly and with probability re-randomize ; otherwise stay. Coupling: run two copies using the same coordinate choice and the same coin flip. Once for some coordinate , that coordinate stays coupled. The coupling time is the time until all coordinates have been re-randomized, which is a coupon collector problem with expected time . By the coupling inequality, .
Exercises
Problem
Two coins have bias and . Construct an explicit optimal coupling where and , and compute . Verify this equals .
Problem
A lazy random walk on the cycle stays put with probability and moves left or right each with probability . Use the spectral gap to bound the mixing time. The eigenvalues of the transition matrix are for .
References
Canonical:
- Levin, Peres, Wilmer, Markov Chains and Mixing Times (2009), Chapters 4-5, 12-13
- Lindvall, Lectures on the Coupling Method (2002), Chapters 1-3
Current:
-
Montenegro & Tetali, "Mathematical Aspects of Mixing Times in Markov Chains" (2006)
-
Robert & Casella, Monte Carlo Statistical Methods (2004), Chapters 3-7
-
Gelman et al., Bayesian Data Analysis (2013), Chapters 10-12
-
Brooks et al., Handbook of MCMC (2011), Chapters 1-5
Next Topics
- Burn-in and convergence diagnostics: practical methods for deciding when MCMC has mixed
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Metropolis-Hastings AlgorithmLayer 2
- Common Probability DistributionsLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Martingale TheoryLayer 0B
- Measure-Theoretic ProbabilityLayer 0B