Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Sampling MCMC

Reversible Jump MCMC

MCMC for model selection: propose moves that change the number of parameters, maintain detailed balance across dimensions via Jacobian corrections, and sample over model space and parameter space simultaneously.

AdvancedTier 3Stable~45 min

Prerequisites

0

Why This Matters

Standard MCMC methods (Metropolis-Hastings, Gibbs, HMC) sample within a fixed-dimensional parameter space. But many real problems require choosing the number of parameters itself: how many clusters in a mixture model, how many change points in a time series, how many hidden factors in a latent model.

Reversible jump MCMC (Green, 1995) extends Metropolis-Hastings to sample over a union of spaces with different dimensions. It is the standard Bayesian approach to model selection when the model indicator is discrete and the parameter spaces differ across models.

Mental Model

Imagine the posterior distribution lives not on a single Rd\mathbb{R}^d but on a disjoint union k{k}×Rdk\bigcup_{k} \{k\} \times \mathbb{R}^{d_k} where kk indexes models and dkd_k is the dimension of model kk. Reversible jump proposes moves that jump between these spaces: adding a parameter (birth), removing one (death), splitting one component into two (split), or merging two into one (merge).

The challenge is maintaining detailed balance when the "from" and "to" states live in different-dimensional spaces.

Formal Setup

Let {Mk:kK}\{M_k : k \in \mathcal{K}\} be a countable collection of models. Under model MkM_k, the parameter vector θk\theta_k lives in Rdk\mathbb{R}^{d_k}. The joint posterior over (k,θk)(k, \theta_k) is:

π(k,θk)p(k)p(θkk)p(datak,θk)\pi(k, \theta_k) \propto p(k) \cdot p(\theta_k \mid k) \cdot p(\text{data} \mid k, \theta_k)

where p(k)p(k) is the prior on model index, p(θkk)p(\theta_k \mid k) is the parameter prior under model kk, and p(datak,θk)p(\text{data} \mid k, \theta_k) is the likelihood.

Definition

Dimension-Matching Condition

To propose a move from state (k,θk)Rdk(k, \theta_k) \in \mathbb{R}^{d_k} to state (k,θk)Rdk(k', \theta_{k'}) \in \mathbb{R}^{d_{k'}}, generate auxiliary random variables uu of dimension dud_u such that:

dk+du=dk+dud_k + d_u = d_{k'} + d_{u'}

where uu' are the auxiliary variables for the reverse move. This ensures a bijection between the "from" and "to" augmented spaces, both of dimension dk+dud_k + d_u.

Definition

Dimension-Changing Map

A deterministic bijection gkkg_{k \to k'} maps:

(θk,u)(θk,u)=gkk(θk,u)(\theta_k, u) \mapsto (\theta_{k'}, u') = g_{k \to k'}(\theta_k, u)

This map, together with its Jacobian, defines how parameters in one model translate to parameters in another.

Main Theorems

Theorem

Reversible Jump Acceptance Probability

Statement

A move from (k,θk)(k, \theta_k) to (k,θk)(k', \theta_{k'}) is proposed by drawing uqu()u \sim q_u(\cdot) and setting (θk,u)=gkk(θk,u)(\theta_{k'}, u') = g_{k \to k'}(\theta_k, u). The acceptance probability is:

α=min(1,π(k,θk)jkkqu(u)π(k,θk)jkkqu(u)detgkk(θk,u))\alpha = \min\left(1, \frac{\pi(k', \theta_{k'}) \cdot j_{k' \to k} \cdot q_{u'}(u')}{\pi(k, \theta_k) \cdot j_{k \to k'} \cdot q_u(u)} \cdot \left|\det \frac{\partial g_{k \to k'}}{\partial (\theta_k, u)}\right|\right)

where jkkj_{k \to k'} is the probability of proposing a move from model kk to model kk' (the move type probability).

Intuition

This is the standard Metropolis-Hastings ratio with one addition: the Jacobian determinant. When you change dimensions, the map gg stretches or compresses volume, analogous to a change of variables in probability. The Jacobian corrects for this volume distortion, just as in a standard change of variables in integration.

Proof Sketch

Write the detailed balance condition on the augmented space (k,θk,u)(k, \theta_k, u). The proposal distribution on this space is jkkqu(u)j_{k \to k'} \cdot q_u(u). The target on this space is π(k,θk)qu(u)\pi(k, \theta_k) \cdot q_u(u) (the auxiliary variables are independent of the target). Apply the change-of-variables formula through gkkg_{k \to k'} to express the reverse move's contribution. The Jacobian arises from this change of variables.

Why It Matters

This formula reduces transdimensional sampling to a standard accept/reject computation. Once you specify the map gg and the auxiliary variable distribution quq_u, the acceptance ratio is fully determined. The art of RJMCMC is choosing gg and quq_u so that the acceptance rate is not too low.

Failure Mode

If the map gg is poorly designed, the proposed parameters in the new model space will have low posterior density, giving very low acceptance rates. The Jacobian determinant can also be extreme (very large or very small), causing most proposals to be rejected. Computing the Jacobian requires knowing the analytical form of gg, which rules out implicit or iterative maps without special treatment.

Theorem

Detailed Balance Across Dimensions

Statement

Under the RJMCMC acceptance probability, the Markov chain satisfies detailed balance with respect to π(k,θk)\pi(k, \theta_k) across all pairs of models:

π(k,θk)T((k,θk)(k,θk))=π(k,θk)T((k,θk)(k,θk))\pi(k, \theta_k) \cdot T((k, \theta_k) \to (k', \theta_{k'})) = \pi(k', \theta_{k'}) \cdot T((k', \theta_{k'}) \to (k, \theta_k))

where TT is the transition kernel that integrates over all possible auxiliary variable realizations. Consequently, if the chain is irreducible and aperiodic, π\pi is the unique stationary distribution.

Intuition

This is the same detailed balance property as standard MH, extended to a state space that is a union of spaces with different dimensions. The key is that the augmented space (original parameters plus auxiliary variables) has the same dimension on both sides of the transition, so the standard MH detailed balance argument applies.

Proof Sketch

On the augmented space Ω=k{k}×Rdk×Rdu(k)\Omega = \bigcup_k \{k\} \times \mathbb{R}^{d_k} \times \mathbb{R}^{d_u(k)}, define the extended target π~(k,θk,u)=π(k,θk)qu(u)\tilde{\pi}(k, \theta_k, u) = \pi(k, \theta_k) q_u(u). The RJMCMC proposal on Ω\Omega is a standard MH proposal (since gg is a bijection between equal-dimensional spaces). Standard MH detailed balance applies on Ω\Omega. Marginalizing out the auxiliary variables recovers detailed balance for π(k,θk)\pi(k, \theta_k).

Why It Matters

Detailed balance guarantees correctness: the chain's stationary distribution is the posterior π\pi, regardless of how poor the proposals are. Poor proposals affect efficiency (low acceptance rates, slow mixing) but not correctness. This separation of correctness from efficiency is the strength of the MCMC framework.

Failure Mode

Detailed balance alone does not guarantee convergence. The chain must also be irreducible (can reach any state from any other state) and aperiodic. In RJMCMC, irreducibility requires that the move types jkkj_{k \to k'} allow reaching all models. If some models are only accessible through a long chain of intermediate models, mixing can be extremely slow.

Common Move Types

Birth/Death. Add or remove a component. To "birth" a new component in a kk-component mixture: draw new parameters (μ,σ,w)(\mu_*, \sigma_*, w_*) from a proposal, adjust existing weights to sum to 1. The reverse "death" move removes a component and redistributes its weight. The auxiliary variables are the proposed parameters for the new component.

Split/Merge. Split one component into two or merge two into one. This is more efficient than birth/death because the split components inherit information from their parent. The map gg takes (μj,σj,wj,u1,u2,u3)(\mu_j, \sigma_j, w_j, u_1, u_2, u_3) to two new components:

μ1=μju1σj,μ2=μj+u1σj\mu_1 = \mu_j - u_1 \sigma_j, \quad \mu_2 = \mu_j + u_1 \sigma_j

with weights w1=wju2w_1 = w_j u_2 and w2=wj(1u2)w_2 = w_j(1 - u_2). The Jacobian of this map must be computed and included in the acceptance ratio.

Application: Mixture Models with Unknown kk

The canonical application is a Gaussian mixture model where the number of components kk is unknown. The model space is {Mk:k=1,2,3,}\{M_k : k = 1, 2, 3, \ldots\} where MkM_k has parameters θk=(μ1,σ1,w1,,μk,σk,wk)\theta_k = (\mu_1, \sigma_1, w_1, \ldots, \mu_k, \sigma_k, w_k).

RJMCMC alternates between:

  1. Within-model moves: standard MH or Gibbs updates to θk\theta_k holding kk fixed
  2. Between-model moves: birth/death or split/merge moves that change kk

The posterior samples give a distribution over kk: the fraction of time the chain spends in model MkM_k estimates P(kdata)P(k \mid \text{data}). This is the fully Bayesian approach to model selection, avoiding point estimates of the model order.

Application: Change-Point Detection

A time series with unknown change points at positions τ1<<τk\tau_1 < \cdots < \tau_k, where kk is unknown. Each segment has its own parameter θj\theta_j. RJMCMC proposes adding a new change point (splitting a segment), removing one (merging two segments), or moving an existing change point. The dimension changes by the number of parameters per segment when adding or removing a change point.

Common Confusions

Watch Out

The Jacobian is not optional

Every dimension-changing map has a Jacobian that must be included in the acceptance ratio. If the map is the identity (e.g., the new parameters are exactly the auxiliary variables), the Jacobian is 1. But for split/merge or any nonlinear map, omitting the Jacobian breaks detailed balance and produces an incorrect stationary distribution. There is no warning; the chain will simply converge to the wrong distribution.

Watch Out

RJMCMC does not require special convergence theory

Some practitioners believe RJMCMC needs different convergence diagnostics than standard MCMC. It does not. The chain is a standard Markov chain on the augmented space. Standard diagnostics (trace plots, effective sample size, R^\hat{R}) apply, though monitoring the model indicator kk is especially important.

Watch Out

Low acceptance rates for dimension-changing moves are normal

Acceptance rates of 1-5% for birth/death or split/merge moves are common and acceptable. The chain spends most of its time doing within-model updates. The between-model moves need only fire occasionally to explore the model space. Compare this with standard MH, where acceptance rates of 20-40% are typical.

Why This Is Hard

The difficulty is entirely in designing good proposals. The dimension-changing map gg must produce parameter values in the new model that are plausible under the posterior. A naive birth move that draws new parameters from the prior will almost always be rejected because the prior is typically much wider than the posterior.

Good proposals use data-informed distributions (e.g., proposing new component means near existing data points) or deterministic maps that preserve sufficient statistics. Designing these proposals requires problem-specific insight; there is no general-purpose solution.

Key Takeaways

  • RJMCMC extends MH to sample over models with different numbers of parameters
  • Dimension-matching: augment both sides so the bijection maps between equal-dimensional spaces
  • The acceptance ratio includes a Jacobian for the dimension-changing map
  • Detailed balance holds on the augmented space, guaranteeing correctness
  • Birth/death and split/merge are the two standard move types
  • The hard part is designing proposals with reasonable acceptance rates

Exercises

ExerciseCore

Problem

A 2-component Gaussian mixture has parameters (μ1,σ1,w1,μ2,σ2,w2)(\mu_1, \sigma_1, w_1, \mu_2, \sigma_2, w_2) with the constraint w1+w2=1w_1 + w_2 = 1, giving 5 free parameters. A 3-component mixture has 8 free parameters. For a birth move from 2 to 3 components, how many auxiliary variables uu are needed? What is the dimension of the augmented space on each side?

ExerciseAdvanced

Problem

Consider a split move for a 1D Gaussian mixture. Component jj with parameters (μj,σj2,wj)(\mu_j, \sigma_j^2, w_j) splits into two components using auxiliary variables (u1,u2,u3)(0,1)3(u_1, u_2, u_3) \in (0,1)^3 via:

μ1=μju1σj\mu_1 = \mu_j - u_1 \sigma_j, μ2=μj+u1σj\mu_2 = \mu_j + u_1 \sigma_j, w1=wju2w_1 = w_j u_2, w2=wj(1u2)w_2 = w_j(1 - u_2), σ12=u3σj2(1u12)\sigma_1^2 = u_3 \sigma_j^2 (1 - u_1^2), σ22=(1u3)σj2(1u12)\sigma_2^2 = (1 - u_3) \sigma_j^2 (1 - u_1^2).

Compute the Jacobian (μ1,μ2,w1,w2,σ12,σ22)/(μj,σj2,wj,u1,u2,u3)|\partial(\mu_1, \mu_2, w_1, w_2, \sigma_1^2, \sigma_2^2)/\partial(\mu_j, \sigma_j^2, w_j, u_1, u_2, u_3)|.

References

Canonical:

  • Green, "Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination" (Biometrika, 1995)
  • Richardson & Green, "On Bayesian Analysis of Mixtures with an Unknown Number of Components" (JRSS-B, 1997)

Current:

  • Brooks, Gelman, Jones, Meng, Handbook of Markov Chain Monte Carlo (2011), Chapter 6

  • Green & Hastie, "Reversible Jump MCMC" (2009), a tutorial review

  • Robert & Casella, Monte Carlo Statistical Methods (2004), Chapters 3-7

Next Topics

Natural extensions from reversible jump MCMC:

  • Birth-death MCMC: continuous-time formulations that avoid explicit Jacobian computations
  • Bayesian model comparison: other approaches to model selection, including Bayes factors and marginal likelihood estimation

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.