Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Sampling MCMC

Griddy Gibbs Sampling

When Gibbs sampling encounters a non-conjugate full conditional, approximate it on a grid and sample from the piecewise constant or piecewise linear approximation. Simple, effective for smooth univariate conditionals.

CoreTier 3Stable~30 min

Prerequisites

0

Why This Matters

Standard Gibbs sampling requires that you can sample from each full conditional distribution p(θjθj,y)p(\theta_j \mid \theta_{-j}, y) directly. In conjugate models, these conditionals have known forms (normal-normal, gamma-Poisson, etc.). In non-conjugate models, the full conditional may have no closed-form sampling algorithm.

Griddy Gibbs (Ritter and Tanner, 1992) handles this by evaluating the conditional on a grid, constructing a discrete approximation, and sampling from that approximation. It is the simplest approach to non-conjugate conditionals, requiring nothing beyond the ability to evaluate the conditional density (up to a normalizing constant) at specified points.

Mental Model

Suppose you need to sample from p(θjθj,y)p(\theta_j \mid \theta_{-j}, y) but cannot do so directly. You can evaluate p(θjθj,y)h(θj)p(\theta_j \mid \theta_{-j}, y) \propto h(\theta_j) for any value of θj\theta_j.

Place a grid of points θj(1),,θj(G)\theta_j^{(1)}, \ldots, \theta_j^{(G)} over the support. Evaluate h(θj(g))h(\theta_j^{(g)}) at each grid point. Normalize these values to form a discrete distribution. Sample a grid cell, then (optionally) sample uniformly within that cell.

The result is an approximate draw from the full conditional. If the grid is fine enough and the conditional is smooth, the approximation error is small.

Formal Setup

Definition

Griddy Gibbs Algorithm

To sample from p(θjθj,y)h(θj)p(\theta_j \mid \theta_{-j}, y) \propto h(\theta_j) where θj\theta_j is a scalar:

  1. Choose grid points a=θj(1)<θj(2)<<θj(G)=ba = \theta_j^{(1)} < \theta_j^{(2)} < \cdots < \theta_j^{(G)} = b spanning the effective support of hh.
  2. Evaluate wg=h(θj(g))w_g = h(\theta_j^{(g)}) for g=1,,Gg = 1, \ldots, G.
  3. Piecewise constant: Set pgwg(θj(g+1)θj(g))p_g \propto w_g \cdot (\theta_j^{(g+1)} - \theta_j^{(g)}) for each interval. Sample interval gg with probability pg/pgp_g / \sum p_g, then sample uniformly within the interval.
  4. Piecewise linear: Interpolate linearly between (wg,wg+1)(w_g, w_{g+1}) within each interval. The CDF within an interval is a quadratic, so inversion sampling is exact.

Main Theorems

Proposition

Convergence of Griddy Gibbs

Statement

Let p^G\hat{p}_G denote the griddy Gibbs approximation to the full conditional pp using GG grid points. If the full conditional is continuous on a compact interval [a,b][a, b], then the total variation distance satisfies:

p^GpTV(ba)ωp(Δ)2\| \hat{p}_G - p \|_{\text{TV}} \leq \frac{(b-a) \cdot \omega_p(\Delta)}{2}

where Δ=maxg(θj(g+1)θj(g))\Delta = \max_g (\theta_j^{(g+1)} - \theta_j^{(g)}) is the maximum grid spacing and ωp(Δ)\omega_p(\Delta) is the modulus of continuity of pp. For Lipschitz conditionals with constant LL, this gives p^GpTVL(ba)Δ/2\| \hat{p}_G - p \|_{\text{TV}} \leq L(b-a)\Delta / 2.

Intuition

A smooth function is well approximated by a piecewise constant function when the grid is fine. The error per interval is bounded by the maximum variation of the function within that interval, times the interval width. Summing over intervals gives the total error.

Proof Sketch

On each interval [θj(g),θj(g+1)][\theta_j^{(g)}, \theta_j^{(g+1)}], the piecewise constant approximation differs from pp by at most ωp(Δ)\omega_p(\Delta) pointwise. Integrating this pointwise error over the interval and summing over all G1G-1 intervals gives the total variation bound.

Why It Matters

The bound tells you how to choose the grid: for a target accuracy ϵ\epsilon in total variation, you need grid spacing Δ2ϵ/(L(ba))\Delta \leq 2\epsilon / (L(b-a)). For smooth conditionals, a moderate grid (50-200 points) suffices.

Failure Mode

If the full conditional has sharp spikes or multiple narrow modes, the grid may miss them entirely. The grid must be fine enough to resolve all features of the conditional. For multimodal conditionals, you need prior knowledge of where the modes are to place grid points appropriately.

Canonical Examples

Example

Non-conjugate logistic regression

In a Bayesian logistic regression with a normal prior on coefficients, the full conditional for each coefficient βj\beta_j is proportional to:

h(βj)=exp(βj22σ2)i=1nlogit1(xiTβ)yi(1logit1(xiTβ))1yih(\beta_j) = \exp\left(-\frac{\beta_j^2}{2\sigma^2}\right) \prod_{i=1}^n \text{logit}^{-1}(x_i^T \beta)^{y_i} (1 - \text{logit}^{-1}(x_i^T \beta))^{1-y_i}

This has no closed form. Place a grid of 100 points over [β^j4σ^j,β^j+4σ^j][\hat{\beta}_j - 4\hat{\sigma}_j, \hat{\beta}_j + 4\hat{\sigma}_j] where β^j\hat{\beta}_j is the MLE and σ^j\hat{\sigma}_j is the asymptotic standard error. Evaluate hh at each grid point, normalize, and sample. The conditional is typically unimodal and smooth, so the grid approximation is accurate.

Common Confusions

Watch Out

Griddy Gibbs is an approximation, not exact

Unlike standard Gibbs sampling, griddy Gibbs does not sample from the exact full conditional. The Markov chain targets an approximate posterior, not the true posterior. The approximation error depends on grid resolution. This error is in addition to MCMC sampling variability.

Watch Out

Griddy Gibbs is only for scalar conditionals

The method requires evaluating the conditional on a grid. In dd dimensions, a grid with GG points per dimension has GdG^d points total. For d>2d > 2, this is impractical. Griddy Gibbs is designed for the case where each full conditional is univariate (or at most bivariate), which is the standard Gibbs setting where you cycle through one variable at a time.

Summary

  • Evaluate the non-conjugate full conditional on a grid and sample from the discrete approximation
  • Piecewise constant is simplest; piecewise linear gives better accuracy for the same grid size
  • Grid spacing controls approximation error: smoother conditionals need fewer grid points
  • Only works for scalar (or low-dimensional) conditionals
  • For log-concave conditionals, adaptive rejection sampling is often preferable because it automatically refines the approximation

Exercises

ExerciseCore

Problem

Suppose a full conditional is Lipschitz with constant L=5L = 5 on the interval [0,10][0, 10]. How many equally spaced grid points do you need to achieve total variation error at most 0.01?

ExerciseAdvanced

Problem

Explain why griddy Gibbs with a piecewise linear approximation is exact (zero approximation error) when the full conditional is itself piecewise linear on the grid intervals. What class of conditionals does this characterize?

References

Canonical:

  • Ritter & Tanner, "Facilitating the Gibbs Sampler: The Gibbs Stopper and the Griddy-Gibbs Sampler" (1992), Journal of the American Statistical Association

Current:

  • Robert & Casella, Monte Carlo Statistical Methods (2004), Chapter 10.3

  • Gilks, Richardson & Spiegelhalter, Markov Chain Monte Carlo in Practice (1996), Chapter 5

  • Gelman et al., Bayesian Data Analysis (2013), Chapters 10-12

  • Brooks et al., Handbook of MCMC (2011), Chapters 1-5

Next Topics

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Next Topics