Sampling MCMC
Gibbs Sampling
The workhorse MCMC algorithm for Bayesian models: sample each variable from its full conditional distribution, cycling through all variables, and every proposal is automatically accepted.
Prerequisites
Why This Matters
Gibbs sampling is a foundational MCMC algorithm with a long history in applied Bayesian statistics. It powered earlier-generation tools like BUGS and JAGS, and remains standard whenever a model has full conditionally conjugate structure. Modern general-purpose Bayesian workflows more often reach for Hamiltonian Monte Carlo (Stan defaults to NUTS) because Gibbs mixes poorly in high-dimensional or strongly correlated posteriors. Gibbs still wins when the conditionals are closed-form: hierarchical models with conjugate priors, mixture models, LDA-style latent-variable models, and any setting where sampling from is cheap.
Its appeal is simplicity: instead of designing a proposal distribution and tuning acceptance rates, you just sample each variable from its conditional distribution given the rest. No tuning parameters, no rejections.
Mental Model
Imagine you have a joint distribution over multiple variables . Sampling from the full joint distribution directly is hard. But sampling from . the distribution of one variable given all others fixed. may be easy.
Gibbs sampling exploits this: cycle through the variables one at a time, updating each from its full conditional. This local updating strategy produces a Markov chain that converges to the full joint distribution.
Formal Setup and Notation
Let be the target joint distribution. We assume we can sample from each full conditional distribution.
Full Conditional Distribution
The full conditional distribution of variable given all other variables is:
where denotes all variables except . In practice, to derive the full conditional for , you write out the full joint density, treat everything that does not involve as a constant, and recognize the resulting kernel as a known distribution.
Gibbs Sampler (Systematic Scan)
Given target and initial state :
At iteration , update each component in sequence:
- Draw
- Draw
- Continue through all variables...
- Draw
Note: each update uses the most recent values of all other variables.
Random Scan Gibbs Sampler
An alternative to systematic scan: at each iteration, choose an index uniformly at random (or with specified probabilities), and update only from .
Random scan has cleaner theoretical properties (it satisfies detailed balance directly), while systematic scan is more commonly used in practice because it updates all variables every iteration.
Gibbs as a Special Case of Metropolis-Hastings
The central theoretical result about Gibbs sampling is that it is a special case of MH where the acceptance probability is always 1.
Gibbs Sampling Has Acceptance Probability 1
Statement
Consider a single Gibbs update for variable : propose from . This is equivalent to a Metropolis-Hastings step with proposal and acceptance probability:
Every proposed move is accepted.
Intuition
The proposal distribution is exactly the full conditional. The optimal distribution for given everything else. There is no mismatch between proposal and target to correct for, so the acceptance ratio evaluates to 1. You are proposing from exactly the right distribution.
Proof Sketch
By definition of the full conditional: .
The acceptance ratio becomes:
The and terms cancel perfectly.
Why It Matters
This result means Gibbs sampling inherits all the theoretical guarantees of MH (correctness via detailed balance, convergence via ergodicity) while avoiding the need to tune proposal distributions or deal with rejected samples. Every iteration produces a new state.
Failure Mode
The acceptance rate of 1 does not mean Gibbs sampling is always efficient. The chain can still mix slowly if the variables are highly correlated. When and are strongly correlated, updating one at a time while fixing the other leads to slow, diffusive exploration. The chain takes small steps along the correlation ridge. This is the main weakness of component-wise Gibbs sampling.
Convergence of the Gibbs Sampler
Statement
Under mild regularity conditions, the Gibbs sampler is ergodic: starting from any initial state , the distribution of converges to in total variation:
For random scan, the chain satisfies detailed balance with respect to . For systematic scan, the chain satisfies the weaker condition of -invariance (but not necessarily detailed balance).
Intuition
Each Gibbs update preserves as the stationary distribution (since it is a valid MH step). As long as the chain can reach any state, guaranteed when the support is connected and the full conditionals have positive density, the chain converges. Random scan gives reversibility automatically; systematic scan gives faster practical convergence but sacrifices reversibility.
Proof Sketch
For random scan: the transition kernel is a mixture of MH kernels, each satisfying detailed balance, so the mixture does too. Irreducibility follows from the positivity of full conditionals on the support.
For systematic scan: the transition kernel is a composition (not mixture) of MH kernels. Each kernel preserves , so the composition does too. Detailed balance may fail, but -invariance plus irreducibility and aperiodicity suffice for convergence.
Why It Matters
This theorem ensures that Gibbs sampling produces valid samples from the target distribution, justifying its use in Bayesian computation. The distinction between random and systematic scan matters for theoretical analysis (e.g., proving CLTs for MCMC estimators) but rarely affects practice.
Failure Mode
Convergence can be extremely slow when variables are strongly correlated or when the posterior has multiple well-separated modes. In the latter case, the chain may get trapped in one mode for a very long time. Blocking (grouping correlated variables and updating them jointly) can help with correlations; tempering or other advanced methods are needed for multimodality.
Blocking for Correlated Variables
When variables and are highly correlated, updating them one at a time leads to slow mixing. Blocking (or collapsed Gibbs) groups correlated variables together and samples them jointly:
Instead of updating then , update as a block. This requires sampling from the joint conditional , which may itself require specialized methods, but the resulting chain mixes much faster when and are correlated.
The general principle: block together variables that are strongly correlated a posteriori.
Gibbs for Conjugate Models
The power of Gibbs sampling is most apparent in conjugate models, where each full conditional belongs to a standard distribution family.
Normal-Normal model (known variance)
Model: for , with prior .
The full conditional for is:
This is a one-block model, so one Gibbs draw gives an exact posterior sample. The power of Gibbs becomes clear in hierarchical extensions with additional latent variables.
Beta-Binomial model
Model: , with prior .
The full conditional is:
Again conjugacy gives a closed-form full conditional. In a hierarchical model with multiple groups sharing a common prior, the Gibbs sampler alternates between updating group-level parameters and the hyperparameters.
Gibbs for a bivariate normal
Target: .
The full conditionals are:
When is close to 1, the chain moves in small steps along the diagonal, illustrating the slow-mixing problem for correlated variables. The autocorrelation of the chain is approximately per iteration.
Common Confusions
Gibbs requires KNOWN full conditionals
Gibbs sampling requires that you can derive and sample from each full conditional distribution. If the full conditional does not have a recognizable closed form, you cannot use standard Gibbs for that variable. The solution is MH within Gibbs: use a Metropolis-Hastings step (with some proposal distribution) to update that variable, while using exact Gibbs updates for the variables where full conditionals are known. This hybrid approach is extremely common in practice.
Gibbs is not always better than MH
The 100% acceptance rate of Gibbs does not mean it is always more efficient than MH. In high-correlation scenarios, a well-tuned MH proposal that accounts for correlations (or Hamiltonian Monte Carlo, which uses gradient information) can vastly outperform component-wise Gibbs. Gibbs excels when the model has conditionally conjugate structure and moderate correlations.
Systematic scan Gibbs does not satisfy detailed balance
A common misconception is that Gibbs sampling always satisfies detailed balance. In fact, only random scan Gibbs satisfies detailed balance. Systematic scan Gibbs, where you cycle through variables in a fixed order, satisfies the weaker property of -invariance: where is the transition kernel. This is sufficient for convergence but means some theoretical tools (e.g., certain CLT results) require more care.
Gibbs updates use the MOST RECENT values
In systematic scan, when updating , you condition on the values of that have already been updated in the current iteration, and the values of from the previous iteration. This is not the same as conditioning on all values from the previous iteration (which would be a different, valid but less efficient, algorithm sometimes called "synchronous" or "parallel" Gibbs).
Summary
- Gibbs sampling updates each variable from its full conditional, cycling through all variables
- It is a special case of MH with acceptance probability 1
- Requires known, tractable full conditional distributions
- Random scan satisfies detailed balance; systematic scan only satisfies -invariance
- Excels in conditionally conjugate models (normal-normal, beta-binomial)
- Slow mixing when variables are highly correlated. use blocking
- When full conditionals are not tractable, use MH within Gibbs
Exercises
Problem
Derive the Gibbs sampler for the bivariate normal distribution:
Find both full conditional distributions and .
Problem
Prove that the Gibbs update for variable , viewed as an MH step with proposal , has acceptance probability exactly 1.
Problem
Consider a mixture model: with , , and priors . Derive the full conditional distributions for the Gibbs sampler: and .
References
Canonical:
- Geman & Geman (1984), "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images"
- Gelfand & Smith (1990), "Sampling-Based Approaches to Calculating Marginal Densities"
Current:
-
Robert & Casella, Monte Carlo Statistical Methods (2004), Chapter 10
-
Gelman, Carlin, Stern, Dunson, Vehtari, Rubin, Bayesian Data Analysis (3rd ed., 2013), Chapter 11
-
Brooks et al., Handbook of MCMC (2011), Chapters 1-5
Next Topics
The natural next steps from Gibbs sampling:
- Griddy Gibbs: approximate Gibbs when full conditionals are not available in closed form
- MCMC convergence diagnostics: how to assess whether the chain has converged
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Metropolis-Hastings AlgorithmLayer 2
- Common Probability DistributionsLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
Builds on This
- Griddy Gibbs SamplingLayer 2
- MCMC for Markov Random FieldsLayer 3
- Perfect SamplingLayer 3