Griddy Gibbs Sampling

Sneiderman, Robby

Sampling MCMC

Griddy Gibbs Sampling

When Gibbs sampling encounters a non-conjugate full conditional, approximate it on a grid and sample from the piecewise constant or piecewise linear approximation. Simple, effective for smooth univariate conditionals.

CoreTier 3StableSupporting~30 min

Prerequisites

Gibbs Sampling

Prereq Map

Learning position

Read this page in the graph.

sampling-mcmc | layer 2 | tier 3. This page has 1 direct prerequisite and 2 published dependents.

Open Atlas Prerequisites Leads to

What next

Adaptive Rejection Sampling

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Standard Gibbs sampling requires that you can sample from each full conditional distribution $p(\theta_j \mid \theta_{-j}, y)$ directly. In conjugate models, these conditionals have known forms (normal-normal, gamma-Poisson, etc.). In non-conjugate models, the full conditional may have no closed-form sampling algorithm.

Infographic on Griddy Gibbs sampling: a discretized Gibbs sampler that handles non-conjugate full conditionals by gridding the conditional density, computing the CDF on the grid, and sampling via inverse-CDF. Shows the trade-off between grid resolution and sampling accuracy. — Griddy Gibbs sidesteps non-conjugacy by sampling each conditional from a discretized grid. Resolution is the knob.

Griddy Gibbs (Ritter and Tanner, 1992) handles this by evaluating the conditional on a grid, constructing a discrete approximation, and sampling from that approximation. It is the simplest approach to non-conjugate conditionals, requiring nothing beyond the ability to evaluate the conditional density (up to a normalizing constant) at specified points.

Mental Model

Suppose you need to sample from $p(\theta_j \mid \theta_{-j}, y)$ but cannot do so directly. You can evaluate $p(\theta_j \mid \theta_{-j}, y) \propto h(\theta_j)$ for any value of $\theta_j$ .

Place a grid of points $\theta_j^{(1)}, \ldots, \theta_j^{(G)}$ over the support. Evaluate $h(\theta_j^{(g)})$ at each grid point. Normalize these values to form a discrete distribution. Sample a grid cell, then (optionally) sample uniformly within that cell.

The result is an approximate draw from the full conditional. If the grid is fine enough and the conditional is smooth, the approximation error is small.

Formal Setup

Definition

Griddy Gibbs Algorithm

To sample from $p(\theta_j \mid \theta_{-j}, y) \propto h(\theta_j)$ where $\theta_j$ is a scalar:

Choose grid points $a = \theta_j^{(1)} < \theta_j^{(2)} < \cdots < \theta_j^{(G)} = b$ spanning the effective support of $h$ .
Evaluate $w_g = h(\theta_j^{(g)})$ for $g = 1, \ldots, G$ .
Piecewise constant: Set $p_g \propto w_g \cdot (\theta_j^{(g+1)} - \theta_j^{(g)})$ for each interval. Sample interval $g$ with probability $p_g / \sum p_g$ , then sample uniformly within the interval.
Piecewise linear: Interpolate linearly between $(w_g, w_{g+1})$ within each interval. The CDF within an interval is a quadratic, so inversion sampling is exact.

Main Theorems

Proposition

Convergence of Griddy Gibbs

Statement

Let $\hat{p}_G$ denote the griddy Gibbs approximation to the full conditional $p$ using $G$ grid points. If the full conditional is continuous on a compact interval $[a, b]$ , then the total variation distance satisfies:

$\| \hat{p}_G - p \|_{\text{TV}} \leq \frac{(b-a) \cdot \omega_p(\Delta)}{2}$

where $\Delta = \max_g (\theta_j^{(g+1)} - \theta_j^{(g)})$ is the maximum grid spacing and $\omega_p(\Delta)$ is the modulus of continuity of $p$ . For Lipschitz conditionals with constant $L$ , this gives $\| \hat{p}_G - p \|_{\text{TV}} \leq L(b-a)\Delta / 2$ .

Intuition

A smooth function is well approximated by a piecewise constant function when the grid is fine. The error per interval is bounded by the maximum variation of the function within that interval, times the interval width. Summing over intervals gives the total error.

Proof Sketch

On each interval $[\theta_j^{(g)}, \theta_j^{(g+1)}]$ , the piecewise constant approximation differs from $p$ by at most $\omega_p(\Delta)$ pointwise. Integrating this pointwise error over the interval and summing over all $G-1$ intervals gives the total variation bound.

Why It Matters

The bound tells you how to choose the grid: for a target accuracy $\epsilon$ in total variation, you need grid spacing $\Delta \leq 2\epsilon / (L(b-a))$ . For smooth conditionals, a moderate grid (50-200 points) suffices.

Failure Mode

If the full conditional has sharp spikes or multiple narrow modes, the grid may miss them entirely. The grid must be fine enough to resolve all features of the conditional. For multimodal conditionals, you need prior knowledge of where the modes are to place grid points appropriately.

report a correction →

Canonical Examples

Example

Non-conjugate logistic regression

In a Bayesian logistic regression with a normal prior on coefficients, the full conditional for each coefficient $\beta_j$ is proportional to:

$h(\beta_j) = \exp\left(-\frac{\beta_j^2}{2\sigma^2}\right) \prod_{i=1}^n \text{logit}^{-1}(x_i^T \beta)^{y_i} (1 - \text{logit}^{-1}(x_i^T \beta))^{1-y_i}$

This has no closed form. Place a grid of 100 points over $[\hat{\beta}_j - 4\hat{\sigma}_j, \hat{\beta}_j + 4\hat{\sigma}_j]$ where $\hat{\beta}_j$ is the MLE and $\hat{\sigma}_j$ is the asymptotic standard error. Evaluate $h$ at each grid point, normalize, and sample. The conditional is typically unimodal and smooth, so the grid approximation is accurate.

Common Confusions

Watch Out

Griddy Gibbs is an approximation, not exact

Unlike standard Gibbs sampling, griddy Gibbs does not sample from the exact full conditional. The Markov chain targets an approximate posterior, not the true posterior. The approximation error depends on grid resolution. This error is in addition to MCMC sampling variability.

Watch Out

Griddy Gibbs is only for scalar conditionals

The method requires evaluating the conditional on a grid. In $d$ dimensions, a grid with $G$ points per dimension has $G^d$ points total. For $d > 2$ , this is impractical. Griddy Gibbs is designed for the case where each full conditional is univariate (or at most bivariate), which is the standard Gibbs setting where you cycle through one variable at a time.

Summary

Evaluate the non-conjugate full conditional on a grid and sample from the discrete approximation
Piecewise constant is simplest; piecewise linear gives better accuracy for the same grid size
Grid spacing controls approximation error: smoother conditionals need fewer grid points
Only works for scalar (or low-dimensional) conditionals
For log-concave conditionals, adaptive rejection sampling is often preferable because it automatically refines the approximation

Exercises

ExerciseCore

Problem

Suppose a full conditional is Lipschitz with constant $L = 5$ on the interval $[0, 10]$ . How many equally spaced grid points do you need to achieve total variation error at most 0.01?

ExerciseAdvanced

Problem

Explain why griddy Gibbs with a piecewise linear approximation is exact (zero approximation error) when the full conditional is itself piecewise linear on the grid intervals. What class of conditionals does this characterize?

References

Canonical:

Ritter & Tanner, "Facilitating the Gibbs Sampler: The Gibbs Stopper and the Griddy-Gibbs Sampler" (1992), Journal of the American Statistical Association

Current:

Robert & Casella, Monte Carlo Statistical Methods (2004), Chapter 10.3
Gilks, Richardson & Spiegelhalter, Markov Chain Monte Carlo in Practice (1996), Chapter 5
Gelman et al., Bayesian Data Analysis (2013), Chapters 10-12
Brooks et al., Handbook of MCMC (2011), Chapters 1-5

Next Topics

Adaptive rejection sampling: automatic envelope refinement for log-concave conditionals
Hamiltonian Monte Carlo: gradient-based MCMC that avoids grid discretization entirely

Last reviewed: April 14, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

1

Gibbs Samplinglayer 2 · tier 1

Derived topics

2

Hamiltonian Monte Carlolayer 3 · tier 2
Adaptive Rejection Samplinglayer 2 · tier 3

Graph-backed continuations

Adaptive Rejection Sampling Hamiltonian Monte Carlo