Sampling MCMC
MCMC for Markov Random Fields
Gibbs sampling on undirected graphical models. The joint distribution factorizes over cliques, and each variable is sampled conditioned on its Markov blanket. Ising model, image denoising, and simulated annealing.
Prerequisites
Why This Matters
Markov random fields (MRFs) are undirected graphical models that encode conditional independence through graph structure. They appear throughout ML and statistics: spatial statistics, image analysis, natural language processing, and physics. The joint distribution of an MRF factorizes over cliques of the graph, but computing marginals or the normalizing constant is typically intractable (it requires summing over exponentially many configurations). Gibbs sampling provides a practical approach: sample each variable conditioned on its neighbors, and iterate. The Markov blanket property of MRFs makes each conditional tractable, even when the full joint is not.
Mental Model
Think of a grid of pixels, each with a label (e.g., black or white). Neighboring pixels tend to have the same label. The MRF encodes this preference through potential functions on edges: pairs of pixels with the same label get high potential, different labels get low potential. The joint probability is proportional to the product of all pairwise potentials. To sample from this distribution, pick a pixel, look at its neighbors, and resample that pixel's label from the conditional distribution given the neighbors. Repeat for all pixels, many times. This is Gibbs sampling on an MRF.
Formal Setup
Markov Random Field
A Markov random field consists of a set of random variables associated with the nodes of an undirected graph , satisfying the local Markov property: where denotes the neighbors of node in . Each variable is conditionally independent of all non-neighbors given its neighbors (its Markov blanket).
Gibbs Distribution on a Graph
A Gibbs distribution with respect to graph takes the form:
where is the set of cliques of , are potential functions defined on clique configurations, and is the partition function (normalizing constant).
The Hammersley-Clifford theorem connects these two definitions.
Core Theory
Hammersley-Clifford Theorem
Statement
If for all and satisfies the local Markov property with respect to an undirected graph , then factorizes as a Gibbs distribution over the cliques of :
Conversely, any Gibbs distribution over satisfies the local Markov property with respect to .
Intuition
The Markov property says that the distribution has a certain conditional independence structure. The theorem says this is equivalent to the distribution factorizing into local pieces (potential functions on cliques). This equivalence is what makes MRFs computationally useful: the conditional independence structure implies that the full joint decomposes into small, manageable factors.
Proof Sketch
The reverse direction is straightforward: compute the conditional from the clique factorization and verify that only terms involving neighbors of remain.
The forward direction is more involved. Assume the positivity condition. Define using a Mobius inversion formula, where is a fixed reference configuration. The Markov property implies that whenever is not a clique (because the conditional independence forces certain terms to cancel). Then , which gives the clique factorization.
Why It Matters
This theorem justifies the standard MRF modeling approach: specify local potentials on cliques, and the resulting distribution automatically satisfies the desired conditional independence structure. It also justifies Gibbs sampling: the factorization ensures that each full conditional depends only on the neighbors, making each Gibbs update local and cheap.
Failure Mode
The positivity condition ( for all ) is necessary. Without it, the equivalence can fail: a distribution can satisfy the local Markov property but not factorize over cliques. In practice, most MRF models include a temperature parameter or smoothing that ensures strict positivity, so this condition is rarely a practical concern.
Gibbs Sampling on MRFs
Gibbs sampling on an MRF iterates:
- Select a node (systematically or randomly)
- Sample , the conditional distribution given the current values of all neighbors
The full conditional is:
Only potential functions involving node appear. For pairwise MRFs (where the largest cliques are edges), this is a product over edges incident to .
Computational cost per update. Each Gibbs update requires computing a product over the cliques containing node . For a pairwise MRF on a grid, each node has 4 neighbors, so each update involves 4 pairwise potentials. This is per update, independent of the graph size.
The Ising Model
Ising Model
The Ising model on graph assigns binary variables to each node, with joint distribution:
where is the inverse temperature (coupling strength) and is the external field. When is large, neighboring nodes strongly prefer to agree. The Ising model is the canonical pairwise binary MRF.
The Gibbs update for the Ising model at node is:
where is the logistic function. This is a logistic regression of on the sum of its neighbors.
Phase transition. On infinite regular lattices, the Ising model exhibits a phase transition at a critical temperature . For , the system is disordered (roughly equal numbers of and ). For , spontaneous magnetization occurs (most nodes align). Near the critical temperature, Gibbs sampling mixes extremely slowly because large clusters of aligned spins form and are difficult to flip one node at a time. This is critical slowing down.
Applications
Image denoising. Model a noisy image as an MRF: each pixel has an observed noisy value and an unknown true value . The data term favors close to . The smoothness prior favors neighboring pixels to have similar values. Gibbs sampling on the posterior produces denoised samples. The MAP estimate (mode of the posterior) can be found by simulated annealing.
Texture synthesis. Learn MRF potentials from a texture example, then sample new textures by running Gibbs sampling with the learned potentials. The Markov property ensures that local statistics of the synthesized texture match the original.
Spatial statistics. Model disease incidence, soil composition, or vegetation patterns over a geographic region. Neighboring regions are coupled through an MRF prior, allowing spatial smoothing while respecting the observed data.
Simulated Annealing on MRFs
To find the MAP configuration , run Gibbs sampling at decreasing temperatures. Replace with and decrease over time. At high temperature, the sampler explores freely. At low temperature, it concentrates on high-probability configurations.
If the temperature decreases as for sufficiently large , simulated annealing converges to the global optimum with probability 1 (Geman and Geman, 1984). In practice, this cooling schedule is too slow, and faster geometric schedules are used at the cost of losing the global optimality guarantee.
Common Confusions
MRFs are not Bayesian networks
Bayesian networks are directed graphical models with conditional probability tables. MRFs are undirected with potential functions. The factorization structures are different: Bayesian networks factorize as products of conditionals, MRFs factorize as products of potentials with a partition function . Some independence structures can be represented by one but not the other.
The partition function Z does not need to be computed for Gibbs sampling
Gibbs sampling requires only the full conditional , which is a ratio of potentials. The partition function cancels in the ratio. This is a major advantage: is typically intractable (exponential sum), but Gibbs sampling sidesteps it entirely.
Slow mixing near phase transitions is not a bug in the algorithm
When the Ising model is near its critical temperature, Gibbs sampling takes exponentially many steps to mix. This is not a failure of Gibbs sampling; it reflects the genuine difficulty of the distribution. The distribution has long-range correlations that no local sampler can resolve quickly. Cluster methods (Swendsen-Wang, Wolff) can help by proposing collective updates.
Key Takeaways
- MRFs are undirected graphical models where the joint distribution factorizes over cliques
- Hammersley-Clifford: local Markov property is equivalent to clique factorization (under positivity)
- Gibbs sampling on MRFs updates each variable conditioned on its Markov blanket (neighbors)
- The partition function is never computed; Gibbs sampling only needs conditional ratios
- The Ising model is the canonical binary MRF, with a phase transition that causes critical slowing down
- Simulated annealing on MRFs finds MAP configurations by sampling at decreasing temperatures
Exercises
Problem
Consider a 3-node chain MRF with binary variables and pairwise potentials and with . Compute the Gibbs conditional .
Problem
For the Ising model on a complete graph (all-to-all connections) with and , show that the Gibbs conditional for node depends on the other nodes only through the sum . This is the Curie-Weiss (mean-field) model. Compute and explain what happens as with replaced by .
References
Canonical:
- Geman and Geman, Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images, IEEE T-PAMI (1984)
- Kindermann and Snell, Markov Random Fields and Their Applications, AMS (1980)
Current:
-
Koller and Friedman, Probabilistic Graphical Models (2009), Chapters 4, 12
-
Murphy, Machine Learning: A Probabilistic Perspective (2012), Chapters 19, 24
-
Robert & Casella, Monte Carlo Statistical Methods (2004), Chapters 3-7
-
Gelman et al., Bayesian Data Analysis (2013), Chapters 10-12
Next Topics
- Variational inference on MRFs (belief propagation, mean-field approximation)
- Swendsen-Wang and Wolff cluster algorithms for faster mixing
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Gibbs SamplingLayer 2
- Metropolis-Hastings AlgorithmLayer 2
- Common Probability DistributionsLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A