Reversible Jump MCMC

Sneiderman, Robby

Sampling MCMC

Reversible Jump MCMC

MCMC for model selection: propose moves that change the number of parameters, maintain detailed balance across dimensions via Jacobian corrections, and sample over model space and parameter space simultaneously.

AdvancedTier 3StableSupporting~45 min

Prerequisites

Metropolis Hastings

Prereq Map

Learning position

Read this page in the graph.

sampling-mcmc | layer 3 | tier 3. This page has 1 direct prerequisite and 0 published dependents.

Open Atlas Prerequisites Leads to

What next

Take the diagnostic

No published continuation is declared yet, so the diagnostic is the clean next route.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Standard MCMC methods (Metropolis-Hastings, Gibbs, HMC) sample within a fixed-dimensional parameter space. But many real problems require choosing the number of parameters itself: how many clusters in a mixture model, how many change points in a time series, how many hidden factors in a latent model.

Reversible jump MCMC (Green, 1995) extends Metropolis-Hastings to sample over a union of spaces with different dimensions. It is the standard Bayesian approach to model selection when the model indicator is discrete and the parameter spaces differ across models.

Mental Model

Imagine the posterior distribution lives not on a single $\mathbb{R}^d$ but on a disjoint union $\bigcup_{k} \{k\} \times \mathbb{R}^{d_k}$ where $k$ indexes models and $d_k$ is the dimension of model $k$ . Reversible jump proposes moves that jump between these spaces: adding a parameter (birth), removing one (death), splitting one component into two (split), or merging two into one (merge).

The challenge is maintaining detailed balance when the "from" and "to" states live in different-dimensional spaces.

Formal Setup

Let $\{M_k : k \in \mathcal{K}\}$ be a countable collection of models. Under model $M_k$ , the parameter vector $\theta_k$ lives in $\mathbb{R}^{d_k}$ . The joint posterior over $(k, \theta_k)$ is:

$\pi(k, \theta_k) \propto p(k) \cdot p(\theta_k \mid k) \cdot p(\text{data} \mid k, \theta_k)$

where $p(k)$ is the prior on model index, $p(\theta_k \mid k)$ is the parameter prior under model $k$ , and $p(\text{data} \mid k, \theta_k)$ is the likelihood.

Definition

Dimension-Matching Condition

To propose a move from state $(k, \theta_k) \in \mathbb{R}^{d_k}$ to state $(k', \theta_{k'}) \in \mathbb{R}^{d_{k'}}$ , generate auxiliary random variables $u$ of dimension $d_u$ such that:

$d_k + d_u = d_{k'} + d_{u'}$

where $u'$ are the auxiliary variables for the reverse move. This ensures a bijection between the "from" and "to" augmented spaces, both of dimension $d_k + d_u$ .

Definition

Dimension-Changing Map

A deterministic bijection $g_{k \to k'}$ maps:

$(\theta_k, u) \mapsto (\theta_{k'}, u') = g_{k \to k'}(\theta_k, u)$

This map, together with its Jacobian, defines how parameters in one model translate to parameters in another.

Main Theorems

Theorem

Reversible Jump Acceptance Probability

Statement

A move from $(k, \theta_k)$ to $(k', \theta_{k'})$ is proposed by drawing $u \sim q_u(\cdot)$ and setting $(\theta_{k'}, u') = g_{k \to k'}(\theta_k, u)$ . The acceptance probability is:

$\alpha = \min\left(1, \frac{\pi(k', \theta_{k'}) \cdot j_{k' \to k} \cdot q_{u'}(u')}{\pi(k, \theta_k) \cdot j_{k \to k'} \cdot q_u(u)} \cdot \left|\det \frac{\partial g_{k \to k'}}{\partial (\theta_k, u)}\right|\right)$

where $j_{k \to k'}$ is the probability of proposing a move from model $k$ to model $k'$ (the move type probability).

Intuition

This is the standard Metropolis-Hastings ratio with one addition: the Jacobian determinant. When you change dimensions, the map $g$ stretches or compresses volume, analogous to a change of variables in probability. The Jacobian corrects for this volume distortion, just as in a standard change of variables in integration.

Proof Sketch

Write the detailed balance condition on the augmented space $(k, \theta_k, u)$ . The proposal distribution on this space is $j_{k \to k'} \cdot q_u(u)$ . The target on this space is $\pi(k, \theta_k) \cdot q_u(u)$ (the auxiliary variables are independent of the target). Apply the change-of-variables formula through $g_{k \to k'}$ to express the reverse move's contribution. The Jacobian arises from this change of variables.

Why It Matters

This formula reduces transdimensional sampling to a standard accept/reject computation. Once you specify the map $g$ and the auxiliary variable distribution $q_u$ , the acceptance ratio is fully determined. The art of RJMCMC is choosing $g$ and $q_u$ so that the acceptance rate is not too low.

Failure Mode

If the map $g$ is poorly designed, the proposed parameters in the new model space will have low posterior density, giving very low acceptance rates. The Jacobian determinant can also be extreme (very large or very small), causing most proposals to be rejected. Computing the Jacobian requires knowing the analytical form of $g$ , which rules out implicit or iterative maps without special treatment.

report a correction →

Theorem

Detailed Balance Across Dimensions

Statement

Under the RJMCMC acceptance probability, the Markov chain satisfies detailed balance with respect to $\pi(k, \theta_k)$ across all pairs of models:

$\pi(k, \theta_k) \cdot T((k, \theta_k) \to (k', \theta_{k'})) = \pi(k', \theta_{k'}) \cdot T((k', \theta_{k'}) \to (k, \theta_k))$

where $T$ is the transition kernel that integrates over all possible auxiliary variable realizations. Consequently, if the chain is irreducible and aperiodic, $\pi$ is the unique stationary distribution.

Intuition

This is the same detailed balance property as standard MH, extended to a state space that is a union of spaces with different dimensions. The key is that the augmented space (original parameters plus auxiliary variables) has the same dimension on both sides of the transition, so the standard MH detailed balance argument applies.

Proof Sketch

On the augmented space $\Omega = \bigcup_k \{k\} \times \mathbb{R}^{d_k} \times \mathbb{R}^{d_u(k)}$ , define the extended target $\tilde{\pi}(k, \theta_k, u) = \pi(k, \theta_k) q_u(u)$ . The RJMCMC proposal on $\Omega$ is a standard MH proposal (since $g$ is a bijection between equal-dimensional spaces). Standard MH detailed balance applies on $\Omega$ . Marginalizing out the auxiliary variables recovers detailed balance for $\pi(k, \theta_k)$ .

Why It Matters

Detailed balance guarantees correctness: the chain's stationary distribution is the posterior $\pi$ , regardless of how poor the proposals are. Poor proposals affect efficiency (low acceptance rates, slow mixing) but not correctness. This separation of correctness from efficiency is the strength of the MCMC framework.

Failure Mode

Detailed balance alone does not guarantee convergence. The chain must also be irreducible (can reach any state from any other state) and aperiodic. In RJMCMC, irreducibility requires that the move types $j_{k \to k'}$ allow reaching all models. If some models are only accessible through a long chain of intermediate models, mixing can be extremely slow.

report a correction →

Common Move Types

Birth/Death. Add or remove a component. To "birth" a new component in a $k$ -component mixture: draw new parameters $(\mu_*, \sigma_*, w_*)$ from a proposal, adjust existing weights to sum to 1. The reverse "death" move removes a component and redistributes its weight. The auxiliary variables are the proposed parameters for the new component.

Split/Merge. Split one component into two or merge two into one. This is more efficient than birth/death because the split components inherit information from their parent. The map $g$ takes $(\mu_j, \sigma_j, w_j, u_1, u_2, u_3)$ to two new components:

$\mu_1 = \mu_j - u_1 \sigma_j, \quad \mu_2 = \mu_j + u_1 \sigma_j$

with weights $w_1 = w_j u_2$ and $w_2 = w_j(1 - u_2)$ . The Jacobian of this map must be computed and included in the acceptance ratio.

Application: Mixture Models with Unknown $k$

The canonical application is a Gaussian mixture model where the number of components $k$ is unknown. The model space is $\{M_k : k = 1, 2, 3, \ldots\}$ where $M_k$ has parameters $\theta_k = (\mu_1, \sigma_1, w_1, \ldots, \mu_k, \sigma_k, w_k)$ .

RJMCMC alternates between:

Within-model moves: standard MH or Gibbs updates to $\theta_k$ holding $k$ fixed
Between-model moves: birth/death or split/merge moves that change $k$

The posterior samples give a distribution over $k$ : the fraction of time the chain spends in model $M_k$ estimates $P(k \mid \text{data})$ . This is the fully Bayesian approach to model selection, avoiding point estimates of the model order.

Application: Change-Point Detection

A time series with unknown change points at positions $\tau_1 < \cdots < \tau_k$ , where $k$ is unknown. Each segment has its own parameter $\theta_j$ . RJMCMC proposes adding a new change point (splitting a segment), removing one (merging two segments), or moving an existing change point. The dimension changes by the number of parameters per segment when adding or removing a change point.

Common Confusions

Watch Out

The Jacobian is not optional

Every dimension-changing map has a Jacobian that must be included in the acceptance ratio. If the map is the identity (e.g., the new parameters are exactly the auxiliary variables), the Jacobian is 1. But for split/merge or any nonlinear map, omitting the Jacobian breaks detailed balance and produces an incorrect stationary distribution. There is no warning; the chain will simply converge to the wrong distribution.

Watch Out

RJMCMC does not require special convergence theory

Some practitioners believe RJMCMC needs different convergence diagnostics than standard MCMC. It does not. The chain is a standard Markov chain on the augmented space. Standard diagnostics (trace plots, effective sample size, $\hat{R}$ ) apply, though monitoring the model indicator $k$ is especially important.

Watch Out

Low acceptance rates for dimension-changing moves are normal

Acceptance rates of 1-5% for birth/death or split/merge moves are common and acceptable. The chain spends most of its time doing within-model updates. The between-model moves need only fire occasionally to explore the model space. Compare this with standard MH, where acceptance rates of 20-40% are typical.

Why This Is Hard

The difficulty is entirely in designing good proposals. The dimension-changing map $g$ must produce parameter values in the new model that are plausible under the posterior. A naive birth move that draws new parameters from the prior will almost always be rejected because the prior is typically much wider than the posterior.

Good proposals use data-informed distributions (e.g., proposing new component means near existing data points) or deterministic maps that preserve sufficient statistics. Designing these proposals requires problem-specific insight; there is no general-purpose solution.

Summary

RJMCMC extends MH to sample over models with different numbers of parameters
Dimension-matching: augment both sides so the bijection maps between equal-dimensional spaces
The acceptance ratio includes a Jacobian for the dimension-changing map
Detailed balance holds on the augmented space, guaranteeing correctness
Birth/death and split/merge are the two standard move types
The hard part is designing proposals with reasonable acceptance rates

Exercises

ExerciseCore

Problem

A 2-component Gaussian mixture has parameters $(\mu_1, \sigma_1, w_1, \mu_2, \sigma_2, w_2)$ with the constraint $w_1 + w_2 = 1$ , giving 5 free parameters. A 3-component mixture has 8 free parameters. For a birth move from 2 to 3 components, how many auxiliary variables $u$ are needed? What is the dimension of the augmented space on each side?

ExerciseAdvanced

Problem

Consider a split move for a 1D Gaussian mixture. Component $j$ with parameters $(\mu_j, \sigma_j^2, w_j)$ splits into two components using auxiliary variables $(u_1, u_2, u_3) \in (0,1)^3$ via:

$\mu_1 = \mu_j - u_1 \sigma_j$ , $\mu_2 = \mu_j + u_1 \sigma_j$ , $w_1 = w_j u_2$ , $w_2 = w_j(1 - u_2)$ , $\sigma_1^2 = u_3 \sigma_j^2 (1 - u_1^2)$ , $\sigma_2^2 = (1 - u_3) \sigma_j^2 (1 - u_1^2)$ .

Compute the Jacobian $|\partial(\mu_1, \mu_2, w_1, w_2, \sigma_1^2, \sigma_2^2)/\partial(\mu_j, \sigma_j^2, w_j, u_1, u_2, u_3)|$ .

References

Canonical:

Green, "Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination" (Biometrika, 1995)
Richardson & Green, "On Bayesian Analysis of Mixtures with an Unknown Number of Components" (JRSS-B, 1997)

Current:

Brooks, Gelman, Jones, Meng, Handbook of Markov Chain Monte Carlo (2011), Chapter 6
Green & Hastie, "Reversible Jump MCMC" (2009), a tutorial review
Robert & Casella, Monte Carlo Statistical Methods (2004), Chapters 3-7

Next Topics

Natural extensions from reversible jump MCMC:

Birth-death MCMC: continuous-time formulations that avoid explicit Jacobian computations
Bayesian model comparison: other approaches to model selection, including Bayes factors and marginal likelihood estimation

Last reviewed: April 14, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

1

Metropolis-Hastings Algorithmlayer 2 · tier 1

Derived topics

0

No published topic currently declares this as a prerequisite.

Read this page in the graph.

Why This Matters

Mental Model

Formal Setup

Main Theorems

Common Move Types

Application: Mixture Models with Unknown kkk

Application: Change-Point Detection

Common Confusions

Why This Is Hard

Summary

Exercises

References

Next Topics

Required before and derived from this topic

Required prerequisites

Derived topics

Application: Mixture Models with Unknown $k$