Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Sampling MCMC

Rejection Sampling

The simplest exact sampling method: propose from an envelope distribution and accept or reject to produce exact independent draws from a target: but doomed to fail in high dimensions.

CoreTier 2Stable~40 min
0

Why This Matters

Rejection sampling is the conceptually simplest method for generating exact samples from a distribution that you can evaluate (up to a normalizing constant) but cannot directly sample from. Unlike MCMC methods, rejection sampling produces independent samples. no burn-in, no autocorrelation, no convergence diagnostics needed.

It is the first sampling algorithm you should learn, and understanding why it fails in high dimensions motivates the entire field of MCMC. Rejection sampling also appears as a building block inside more advanced algorithms: adaptive rejection sampling (ARS), slice sampling, and perfect simulation.

Mental Model

You want to throw darts at a dartboard shaped like your target distribution f(x)f(x). But the shape is complicated, so instead you throw darts uniformly at a larger, simpler region Mg(x)M \cdot g(x) that completely covers f(x)f(x). Any dart that lands below f(x)f(x) is kept; any that lands above is thrown away. The kept darts are distributed exactly according to ff.

The efficiency depends on how tight the envelope Mg(x)Mg(x) is. If the envelope is much larger than f(x)f(x), you throw away most darts.

Formal Setup and Notation

Let f(x)f(x) be the target density (possibly unnormalized). Let g(x)g(x) be a proposal density from which we can sample, and let M>0M > 0 be a constant such that Mg(x)f(x)M\, g(x) \geq f(x) for all xx.

Definition

Envelope Function

The envelope function Mg(x)M \cdot g(x) is a scaled version of the proposal density that dominates the target everywhere:

Mg(x)f(x)for all xM\, g(x) \geq f(x) \quad \text{for all } x

The constant MM must satisfy Msupxf(x)g(x)M \geq \sup_x \frac{f(x)}{g(x)}. The tightest envelope uses M=supxf(x)g(x)M = \sup_x \frac{f(x)}{g(x)}, which maximizes the acceptance probability.

Definition

Rejection Sampling Algorithm

To generate a sample from f(x)/f(x)dxf(x)/\int f(x)\,dx:

  1. Propose: Draw xg()x \sim g(\cdot)
  2. Accept/Reject: Draw uUniform(0,1)u \sim \text{Uniform}(0, 1). If uf(x)Mg(x)u \leq \frac{f(x)}{M\, g(x)}, accept xx; otherwise reject and return to step 1.
  3. Output the accepted xx.

The accepted samples are exact, independent draws from f(x)/f(x)dxf(x)/\int f(x)\,dx.

Definition

Acceptance Probability

The probability that a proposed sample is accepted is:

P(accept)=f(x)Mg(x)g(x)dx=1Mf(x)dxP(\text{accept}) = \int \frac{f(x)}{M\, g(x)}\, g(x)\, dx = \frac{1}{M}\int f(x)\, dx

If ff is normalized (integrates to 1), then P(accept)=1/MP(\text{accept}) = 1/M. The expected number of proposals per accepted sample is MM.

Main Theorems

Theorem

Correctness of Rejection Sampling

Statement

Samples accepted by the rejection sampling algorithm are distributed exactly according to the target distribution p(x)=f(x)/f(x)dxp(x) = f(x)/\int f(x)\,dx. That is, for any measurable set AA:

P(XAaccepted)=Af(x)dxf(x)dxP(X \in A \mid \text{accepted}) = \frac{\int_A f(x)\,dx}{\int f(x)\,dx}

Intuition

The joint distribution of (x,u)(x, u) where xgx \sim g and uUniform(0,Mg(x))u \sim \text{Uniform}(0, Mg(x)) is uniform under the curve Mg(x)Mg(x). Accepting only points where uf(x)u \leq f(x) is equivalent to restricting to the region under f(x)f(x). Points uniform under f(x)f(x) have xx-marginal proportional to f(x)f(x).

Proof Sketch

For any set AA:

P(XAaccept)=P(XA and accept)P(accept)P(X \in A \mid \text{accept}) = \frac{P(X \in A \text{ and accept})}{P(\text{accept})}

The numerator is:

AP ⁣(uf(x)Mg(x))g(x)dx=Af(x)Mg(x)g(x)dx=1MAf(x)dx\int_A P\!\left(u \leq \frac{f(x)}{Mg(x)}\right) g(x)\,dx = \int_A \frac{f(x)}{Mg(x)}\, g(x)\,dx = \frac{1}{M}\int_A f(x)\,dx

The denominator is P(accept)=1Mf(x)dxP(\text{accept}) = \frac{1}{M}\int f(x)\,dx.

Dividing: P(XAaccept)=Af(x)dxf(x)dx=Ap(x)dxP(X \in A \mid \text{accept}) = \frac{\int_A f(x)\,dx}{\int f(x)\,dx} = \int_A p(x)\,dx.

Why It Matters

This guarantees that rejection sampling produces exact samples from the target. no approximation, no asymptotic arguments, no burn-in. Each accepted sample is an independent draw from pp. This is a stronger guarantee than any MCMC method provides (MCMC only converges to the target asymptotically).

Failure Mode

The theorem guarantees correctness but says nothing about efficiency. If MM is very large, the acceptance rate 1/M1/M is tiny and the algorithm is impractical. The fundamental challenge is finding a tight envelope.

Theorem

Dimensional Scaling of Rejection Sampling

Statement

For "comparable" dd-dimensional distributions (e.g., target and proposal are both Gaussian with different means or covariances), the envelope constant MM grows exponentially in dd:

M=Ω(ecd)M = \Omega(e^{cd})

for some constant c>0c > 0. Consequently, the acceptance rate 1/M=O(ecd)1/M = O(e^{-cd}) decreases exponentially with dimension.

Intuition

In high dimensions, probability mass concentrates on thin shells. Even if two distributions are similar in shape, their mass is concentrated on slightly different shells. The envelope must cover both shells, but the overlap between the two shells shrinks exponentially with dimension. The result is that MM must be exponentially large to maintain the dominance condition Mg(x)f(x)Mg(x) \geq f(x) everywhere.

Proof Sketch

Consider f=N(0,Id)f = \mathcal{N}(0, I_d) and g=N(0,σ2Id)g = \mathcal{N}(0, \sigma^2 I_d) with σ>1\sigma > 1. Then:

M=supxf(x)g(x)=supxσdexp ⁣(x22(11σ2))M = \sup_x \frac{f(x)}{g(x)} = \sup_x \sigma^d \exp\!\left(-\frac{\|x\|^2}{2}\left(1 - \frac{1}{\sigma^2}\right)\right)

The supremum is achieved at x=0x = 0, giving M=σdM = \sigma^d. Since σ>1\sigma > 1, this is exponential in dd. The acceptance rate is σd\sigma^{-d}.

Even for σ=1.01\sigma = 1.01, in d=1000d = 1000 dimensions: M=1.011000e1022,000M = 1.01^{1000} \approx e^{10} \approx 22{,}000.

Why It Matters

This is the reason rejection sampling is impractical in high dimensions and the motivation for MCMC methods. MCMC avoids the need for a global envelope by making local moves that are always in roughly the right neighborhood. Understanding this dimensional failure is essential background for appreciating why MCMC was such a breakthrough.

Failure Mode

The exponential scaling is not a consequence of poor proposal choice. It is inherent to the accept-reject paradigm in high dimensions. No clever choice of gg can avoid it when the target lives in many dimensions. This is why rejection sampling is useful primarily for low-dimensional problems (say, d5d \leq 5).

Squeezed Rejection Sampling

When evaluating f(x)f(x) is expensive, you can add a lower bound (squeezing function) L(x)f(x)L(x) \leq f(x) that is cheap to evaluate:

  1. Propose xgx \sim g, draw uUniform(0,Mg(x))u \sim \text{Uniform}(0, Mg(x))
  2. If uL(x)u \leq L(x): accept immediately (cheap. did not evaluate ff)
  3. If u>Mg(x)1u > Mg(x) \cdot 1: reject immediately (never happens by construction)
  4. If L(x)<uf(x)L(x) < u \leq f(x): accept (requires evaluating ff)
  5. If f(x)<uf(x) < u: reject (requires evaluating ff)

The squeeze L(x)L(x) saves expensive evaluations of ff for samples that would clearly be accepted. This is useful when ff involves a complex likelihood calculation.

Adaptive Rejection Sampling (ARS)

For log-concave densities, there is a powerful refinement that avoids the problem of choosing MM manually.

Definition

Adaptive Rejection Sampling

When logf(x)\log f(x) is concave (i.e., ff is log-concave), the tangent lines to logf\log f at a set of evaluation points form a piecewise-linear upper bound on logf\log f. Exponentiating gives an envelope for ff that is piecewise exponential and can be sampled exactly.

Algorithm:

  1. Start with a set of abscissae {x1,,xk}\{x_1, \ldots, x_k\}
  2. Construct the piecewise-linear upper hull of logf\log f using tangent lines at each xjx_j
  3. Exponentiate to get the envelope Mg(x)M \cdot g(x)
  4. Sample from the piecewise-exponential envelope
  5. Accept or reject as usual
  6. If rejected, add the rejected point to the abscissae and update the hull

Key property: the envelope gets tighter with each rejection, so the acceptance rate improves over time, converging to 1.

Log-concavity holds for many standard distributions: normal, exponential, gamma (with α1\alpha \geq 1), beta (with α,β1\alpha, \beta \geq 1), and logistic. ARS is the method of choice for sampling from univariate log-concave distributions.

Canonical Examples

Example

Sampling Beta(2,5) using a uniform envelope

Target: f(θ)=θ1(1θ)4f(\theta) = \theta^{1}(1-\theta)^{4} on [0,1][0,1] (unnormalized Beta(2,5) density).

Proposal: g(θ)=1g(\theta) = 1 (Uniform(0,1)).

Find MM: We need Mmaxθf(θ)M \geq \max_\theta f(\theta). Taking the derivative: f(θ)=(1θ)44θ(1θ)3=(1θ)3(15θ)=0f'(\theta) = (1-\theta)^4 - 4\theta(1-\theta)^3 = (1-\theta)^3(1 - 5\theta) = 0 at θ=1/5\theta^* = 1/5.

M=f(1/5)=(1/5)(4/5)4=256/31250.082M = f(1/5) = (1/5)(4/5)^4 = 256/3125 \approx 0.082.

Acceptance rate: 1/M01f(θ)dθ=B(2,5)/M=(1/30)/0.0820.4071/M \cdot \int_0^1 f(\theta)d\theta = B(2,5)/M = (1/30)/0.082 \approx 0.407.

So about 40.7% of proposals are accepted. The algorithm:

  1. Draw θUniform(0,1)\theta \sim \text{Uniform}(0,1)
  2. Draw uUniform(0,1)u \sim \text{Uniform}(0,1)
  3. If uθ(1θ)4/Mu \leq \theta(1-\theta)^4 / M, accept θ\theta; else reject
Example

Sampling from a truncated normal

Target: f(x)=ex2/2f(x) = e^{-x^2/2} for x[2,)x \in [2, \infty) (right tail of standard normal, unnormalized).

Proposal: g(x)=e2(x2)g(x) = e^{-2(x-2)} for x2x \geq 2 (Exponential with rate 2, shifted to start at 2).

Envelope constant: M=supx2f(x)g(x)=supx2ex2/2+2(x2)M = \sup_{x \geq 2} \frac{f(x)}{g(x)} = \sup_{x \geq 2} e^{-x^2/2 + 2(x-2)}.

The exponent is x2/2+2x4=(x2)2/2-x^2/2 + 2x - 4 = -(x-2)^2/2, maximized at x=2x = 2 where it equals 0. So M=1M = 1.

Acceptance probability at each point: f(x)/(Mg(x))=e(x2)2/2f(x)/(Mg(x)) = e^{-(x-2)^2/2}.

Average acceptance rate: 2e(x2)2/22e2(x2)dx=20et2/22tdt\int_2^\infty e^{-(x-2)^2/2} \cdot 2e^{-2(x-2)} dx = 2\int_0^\infty e^{-t^2/2 - 2t}dt.

This is reasonably efficient because the exponential proposal matches the tail behavior of the normal well.

Example

Why rejection sampling fails for 100-dimensional Gaussian

Target: N(0,I100)\mathcal{N}(0, I_{100}). Proposal: N(0,1.12I100)\mathcal{N}(0, 1.1^2\, I_{100}).

The proposal is only 10% wider in each dimension. seemingly a good match. But M=(1.1)1001.38×104M = (1.1)^{100} \approx 1.38 \times 10^{4}. The acceptance rate is 1/M7×1051/M \approx 7 \times 10^{-5}: you would need about 14,000 proposals for each accepted sample.

With σ=1.5\sigma = 1.5: M=1.51004×1017M = 1.5^{100} \approx 4 \times 10^{17}. Completely impractical.

This is not a failure of the proposal choice. It is inherent to rejection sampling in high dimensions.

Common Confusions

Watch Out

Rejection sampling requires a dominating envelope, not just overlap

A common error is choosing gg such that g(x)>0g(x) > 0 on the support of ff but failing to ensure Mg(x)f(x)Mg(x) \geq f(x) everywhere. If the envelope does not dominate, the algorithm produces biased samples. regions where f(x)>Mg(x)f(x) > Mg(x) are under-represented. You must verify the dominance condition mathematically or numerically before running the algorithm.

Watch Out

Rejected samples tell you nothing about the target

Unlike in Metropolis-Hastings (where the current state is repeated on rejection), in rejection sampling, rejected proposals are simply discarded. They do not contribute to the sample in any way. Only accepted samples are used. This means the efficiency is entirely determined by the acceptance rate.

Watch Out

Rejection sampling produces independent samples

This is a feature, not a bug, and a key distinction from MCMC. Each accepted sample is independent of all others. There is no autocorrelation, no burn-in, no need for convergence diagnostics. If you can afford the acceptance rate, rejection sampling is strictly preferable to MCMC for producing independent samples.

Watch Out

You do NOT need the normalizing constant of f

Just like MH, rejection sampling works with unnormalized targets. If f(x)=cp(x)f(x) = c \cdot p(x) for unknown cc, the acceptance ratio f(x)/(Mg(x))f(x)/(Mg(x)) still produces samples from pp. The constant cc is absorbed into MM: you find MM such that Mg(x)f(x)Mg(x) \geq f(x), and the acceptance condition uf(x)/(Mg(x))u \leq f(x)/(Mg(x)) does not require knowing cc separately.

Summary

  • Rejection sampling produces exact, independent samples from the target
  • Requires an envelope: Mg(x)f(x)Mg(x) \geq f(x) for all xx
  • Acceptance rate = f(x)dx/M\int f(x)dx / M (or 1/M1/M if ff is normalized)
  • Squeezed rejection adds a lower bound to avoid expensive ff evaluations
  • ARS exploits log-concavity for automatic, improving envelopes
  • Acceptance rate decays exponentially with dimension. rejection sampling fails in high dimensions
  • This dimensional failure is the primary motivation for MCMC methods

Exercises

ExerciseCore

Problem

Design a rejection sampler for the Beta(2, 5) distribution using a Uniform(0, 1) envelope. Find the optimal MM, write out the algorithm, and compute the acceptance rate.

ExerciseCore

Problem

Prove that the acceptance rate of rejection sampling equals f(x)dx/M\int f(x)dx / M when ff is unnormalized, or 1/M1/M when ff is a proper density.

ExerciseAdvanced

Problem

Consider rejection sampling from f(x)=exf(x) = e^{-x} on [0,)[0, \infty) (standard Exponential) using proposal g(x)=1(1+x)2g(x) = \frac{1}{(1+x)^2} on [0,)[0, \infty) (a Pareto-like distribution). Find the optimal MM and the acceptance rate. Is this a good choice of proposal?

References

Canonical:

  • von Neumann (1951), "Various Techniques Used in Connection with Random Digits"
  • Gilks & Wild (1992), "Adaptive Rejection Sampling for Gibbs Sampling"

Current:

  • Robert & Casella, Monte Carlo Statistical Methods (2004), Chapter 2

  • Devroye, Non-Uniform Random Variate Generation (1986), Chapters 2-3

  • Gelman et al., Bayesian Data Analysis (2013), Chapters 10-12

  • Brooks et al., Handbook of MCMC (2011), Chapters 1-5

Next Topics

The natural next steps from rejection sampling:

  • Adaptive rejection sampling: automatic envelope construction for log-concave densities
  • Importance sampling: reweighting instead of rejecting, avoiding the need for a dominating envelope

Last reviewed: April 2026

Builds on This

Next Topics