Rejection Sampling

Sneiderman, Robby

Sampling MCMC

Rejection Sampling

The simplest exact sampling method: propose from an envelope distribution and accept or reject to produce exact independent draws from a target: but doomed to fail in high dimensions.

CoreTier 2StableSupporting~40 min

Prerequisites

Monte Carlo Methods

Start 8-question practice · 7 available 3-question pulse check Prereq Map

Learning position

Read this page in the graph.

sampling-mcmc | layer 1 | tier 2. This page has 1 direct prerequisite and 3 published dependents.

Open Atlas Prerequisites Leads to

What next

Adaptive Rejection Sampling

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Rejection sampling is the conceptually simplest method for generating exact samples from a distribution that you can evaluate (up to a normalizing constant) but cannot directly sample from. Unlike MCMC methods, rejection sampling produces independent samples. no burn-in, no autocorrelation, no convergence diagnostics needed.

It is the first sampling algorithm you should learn, and understanding why it fails in high dimensions motivates the entire field of MCMC. Rejection sampling also appears as a building block inside more advanced algorithms: adaptive rejection sampling (ARS), slice sampling, and perfect simulation.

Mental Model

You want to throw darts at a dartboard shaped like your target distribution $f(x)$ . But the shape is complicated, so instead you throw darts uniformly at a larger, simpler region $M \cdot g(x)$ that completely covers $f(x)$ . Any dart that lands below $f(x)$ is kept; any that lands above is thrown away. The kept darts are distributed exactly according to $f$ .

The efficiency depends on how tight the envelope $Mg(x)$ is. If the envelope is much larger than $f(x)$ , you throw away most darts.

Formal Setup and Notation

Let $f(x)$ be the target density (possibly unnormalized). Let $g(x)$ be a proposal density from which we can sample, and let $M > 0$ be a constant such that $M\, g(x) \geq f(x)$ for all $x$ .

Definition

Envelope Function $M \cdot g (x)$

The envelope function $M \cdot g(x)$ is a scaled version of the proposal density that dominates the target everywhere:

$M\, g(x) \geq f(x) \quad \text{for all } x$

The constant $M$ must satisfy $M \geq \sup_x \frac{f(x)}{g(x)}$ . The tightest envelope uses $M = \sup_x \frac{f(x)}{g(x)}$ , which maximizes the acceptance probability.

Definition

Rejection Sampling Algorithm

To generate a sample from $f(x)/\int f(x)\,dx$ :

Propose: Draw $x \sim g(\cdot)$
Accept/Reject: Draw $u \sim \text{Uniform}(0, 1)$ . If $u \leq \frac{f(x)}{M\, g(x)}$ , accept $x$ ; otherwise reject and return to step 1.
Output the accepted $x$ .

The accepted samples are exact, independent draws from $f(x)/\int f(x)\,dx$ .

Definition

Acceptance Probability

The probability that a proposed sample is accepted is:

$P(\text{accept}) = \int \frac{f(x)}{M\, g(x)}\, g(x)\, dx = \frac{1}{M}\int f(x)\, dx$

If $f$ is normalized (integrates to 1), then $P(\text{accept}) = 1/M$ . The expected number of proposals per accepted sample is $M$ .

Main Theorems

Theorem

Correctness of Rejection Sampling

Statement

Samples accepted by the rejection sampling algorithm are distributed exactly according to the target distribution $p(x) = f(x)/\int f(x)\,dx$ . That is, for any measurable set $A$ :

$P(X \in A \mid \text{accepted}) = \frac{\int_A f(x)\,dx}{\int f(x)\,dx}$

Intuition

The joint distribution of $(x, u)$ where $x \sim g$ and $u \sim \text{Uniform}(0, Mg(x))$ is uniform under the curve $Mg(x)$ . Accepting only points where $u \leq f(x)$ is equivalent to restricting to the region under $f(x)$ . Points uniform under $f(x)$ have $x$ -marginal proportional to $f(x)$ .

Proof Sketch

For any set $A$ :

$P(X \in A \mid \text{accept}) = \frac{P(X \in A \text{ and accept})}{P(\text{accept})}$

The numerator is:

$\int_A P\!\left(u \leq \frac{f(x)}{Mg(x)}\right) g(x)\,dx = \int_A \frac{f(x)}{Mg(x)}\, g(x)\,dx = \frac{1}{M}\int_A f(x)\,dx$

The denominator is $P(\text{accept}) = \frac{1}{M}\int f(x)\,dx$ .

Dividing: $P(X \in A \mid \text{accept}) = \frac{\int_A f(x)\,dx}{\int f(x)\,dx} = \int_A p(x)\,dx$ .

Why It Matters

This guarantees that rejection sampling produces exact samples from the target. no approximation, no asymptotic arguments, no burn-in. Each accepted sample is an independent draw from $p$ . This is a stronger guarantee than any MCMC method provides (MCMC only converges to the target asymptotically).

Failure Mode

The theorem guarantees correctness but says nothing about efficiency. If $M$ is very large, the acceptance rate $1/M$ is tiny and the algorithm is impractical. The fundamental challenge is finding a tight envelope.

report a correction →

Theorem

Dimensional Scaling of Rejection Sampling

Statement

For "comparable" $d$ -dimensional distributions (e.g., target and proposal are both Gaussian with different means or covariances), the envelope constant $M$ grows exponentially in $d$ :

$M = \Omega(e^{cd})$

for some constant $c > 0$ . Consequently, the acceptance rate $1/M = O(e^{-cd})$ decreases exponentially with dimension.

Intuition

In high dimensions, probability mass concentrates on thin shells. Even if two distributions are similar in shape, their mass is concentrated on slightly different shells. The envelope must cover both shells, but the overlap between the two shells shrinks exponentially with dimension. The result is that $M$ must be exponentially large to maintain the dominance condition $Mg(x) \geq f(x)$ everywhere.

Proof Sketch

Consider $f = \mathcal{N}(0, I_d)$ and $g = \mathcal{N}(0, \sigma^2 I_d)$ with $\sigma > 1$ . Then:

$M = \sup_x \frac{f(x)}{g(x)} = \sup_x \sigma^d \exp\!\left(-\frac{\|x\|^2}{2}\left(1 - \frac{1}{\sigma^2}\right)\right)$

The supremum is achieved at $x = 0$ , giving $M = \sigma^d$ . Since $\sigma > 1$ , this is exponential in $d$ . The acceptance rate is $\sigma^{-d}$ .

Even for $\sigma = 1.01$ , in $d = 1000$ dimensions: $M = 1.01^{1000} \approx e^{10} \approx 22{,}000$ .

Why It Matters

This is the reason rejection sampling is impractical in high dimensions and the motivation for MCMC methods. MCMC avoids the need for a global envelope by making local moves that are always in roughly the right neighborhood. Understanding this dimensional failure is essential background for appreciating why MCMC was such a breakthrough.

Failure Mode

The exponential scaling is not a consequence of poor proposal choice. It is inherent to the accept-reject paradigm in high dimensions. No clever choice of $g$ can avoid it when the target lives in many dimensions. This is why rejection sampling is useful primarily for low-dimensional problems (say, $d \leq 5$ ).

report a correction →

Squeezed Rejection Sampling

When evaluating $f(x)$ is expensive, you can add a lower bound (squeezing function) $L(x) \leq f(x)$ that is cheap to evaluate:

Propose $x \sim g$ , draw $u \sim \text{Uniform}(0, Mg(x))$
If $u \leq L(x)$ : accept immediately (cheap. did not evaluate $f$ )
If $u > Mg(x) \cdot 1$ : reject immediately (never happens by construction)
If $L(x) < u \leq f(x)$ : accept (requires evaluating $f$ )
If $f(x) < u$ : reject (requires evaluating $f$ )

The squeeze $L(x)$ saves expensive evaluations of $f$ for samples that would clearly be accepted. This is useful when $f$ involves a complex likelihood calculation.

Adaptive Rejection Sampling (ARS)

For log-concave densities, there is a powerful refinement that avoids the problem of choosing $M$ manually.

Definition

Adaptive Rejection Sampling

When $\log f(x)$ is concave (i.e., $f$ is log-concave), the tangent lines to $\log f$ at a set of evaluation points form a piecewise-linear upper bound on $\log f$ . Exponentiating gives an envelope for $f$ that is piecewise exponential and can be sampled exactly.

Algorithm:

Start with a set of abscissae $\{x_1, \ldots, x_k\}$
Construct the piecewise-linear upper hull of $\log f$ using tangent lines at each $x_j$
Exponentiate to get the envelope $M \cdot g(x)$
Sample from the piecewise-exponential envelope
Accept or reject as usual
If rejected, add the rejected point to the abscissae and update the hull

Key property: the envelope gets tighter with each rejection, so the acceptance rate improves over time, converging to 1.

Log-concavity holds for many standard distributions: normal, exponential, gamma (with $\alpha \geq 1$ ), beta (with $\alpha, \beta \geq 1$ ), and logistic. ARS is the method of choice for sampling from univariate log-concave distributions.

Canonical Examples

Example

Sampling Beta(2,5) using a uniform envelope

Target: $f(\theta) = \theta^{1}(1-\theta)^{4}$ on $[0,1]$ (unnormalized Beta(2,5) density).

Proposal: $g(\theta) = 1$ (Uniform(0,1)).

Find $M$ : We need $M \geq \max_\theta f(\theta)$ . Taking the derivative: $f'(\theta) = (1-\theta)^4 - 4\theta(1-\theta)^3 = (1-\theta)^3(1 - 5\theta) = 0$ at $\theta^* = 1/5$ .

$M = f(1/5) = (1/5)(4/5)^4 = 256/3125 \approx 0.082$ .

Acceptance rate: $1/M \cdot \int_0^1 f(\theta)d\theta = B(2,5)/M = (1/30)/0.082 \approx 0.407$ .

So about 40.7% of proposals are accepted. The algorithm:

Draw $\theta \sim \text{Uniform}(0,1)$
Draw $u \sim \text{Uniform}(0,1)$
If $u \leq \theta(1-\theta)^4 / M$ , accept $\theta$ ; else reject

Example

Sampling from a truncated normal

Target: $f(x) = e^{-x^2/2}$ for $x \in [2, \infty)$ (right tail of standard normal, unnormalized).

Proposal: $g(x) = e^{-2(x-2)}$ for $x \geq 2$ (Exponential with rate 2, shifted to start at 2).

Envelope constant: $M = \sup_{x \geq 2} \frac{f(x)}{g(x)} = \sup_{x \geq 2} e^{-x^2/2 + 2(x-2)}$ .

The exponent is $-x^2/2 + 2x - 4 = -(x-2)^2/2$ , maximized at $x = 2$ where it equals 0. So $M = 1$ .

Acceptance probability at each point: $f(x)/(Mg(x)) = e^{-(x-2)^2/2}$ .

Average acceptance rate: $\int_2^\infty e^{-(x-2)^2/2} \cdot 2e^{-2(x-2)} dx = 2\int_0^\infty e^{-t^2/2 - 2t}dt$ .

This is reasonably efficient because the exponential proposal matches the tail behavior of the normal well.

Example

Why rejection sampling fails for 100-dimensional Gaussian

Target: $\mathcal{N}(0, I_{100})$ . Proposal: $\mathcal{N}(0, 1.1^2\, I_{100})$ .

The proposal is only 10% wider in each dimension. seemingly a good match. But $M = (1.1)^{100} \approx 1.38 \times 10^{4}$ . The acceptance rate is $1/M \approx 7 \times 10^{-5}$ : you would need about 14,000 proposals for each accepted sample.

With $\sigma = 1.5$ : $M = 1.5^{100} \approx 4 \times 10^{17}$ . Completely impractical.

This is not a failure of the proposal choice. It is inherent to rejection sampling in high dimensions.

Common Confusions

Watch Out

Rejection sampling requires a dominating envelope, not just overlap

A common error is choosing $g$ such that $g(x) > 0$ on the support of $f$ but failing to ensure $Mg(x) \geq f(x)$ everywhere. If the envelope does not dominate, the algorithm produces biased samples. regions where $f(x) > Mg(x)$ are under-represented. You must verify the dominance condition mathematically or numerically before running the algorithm.

Watch Out

Rejected samples tell you nothing about the target

Unlike in Metropolis-Hastings (where the current state is repeated on rejection), in rejection sampling, rejected proposals are simply discarded. They do not contribute to the sample in any way. Only accepted samples are used. This means the efficiency is entirely determined by the acceptance rate.

Watch Out

Rejection sampling produces independent samples

This is a feature, not a bug, and a key distinction from MCMC. Each accepted sample is independent of all others. There is no autocorrelation, no burn-in, no need for convergence diagnostics. If you can afford the acceptance rate, rejection sampling is strictly preferable to MCMC for producing independent samples.

Watch Out

You do NOT need the normalizing constant of f

Just like MH, rejection sampling works with unnormalized targets. If $f(x) = c \cdot p(x)$ for unknown $c$ , the acceptance ratio $f(x)/(Mg(x))$ still produces samples from $p$ . The constant $c$ is absorbed into $M$ : you find $M$ such that $Mg(x) \geq f(x)$ , and the acceptance condition $u \leq f(x)/(Mg(x))$ does not require knowing $c$ separately.

Summary

Rejection sampling produces exact, independent samples from the target
Requires an envelope: $Mg(x) \geq f(x)$ for all $x$
Acceptance rate = $\int f(x)dx / M$ (or $1/M$ if $f$ is normalized)
Squeezed rejection adds a lower bound to avoid expensive $f$ evaluations
ARS exploits log-concavity for automatic, improving envelopes
Acceptance rate decays exponentially with dimension. rejection sampling fails in high dimensions
This dimensional failure is the primary motivation for MCMC methods

Exercises

ExerciseCore

Problem

Design a rejection sampler for the Beta(2, 5) distribution using a Uniform(0, 1) envelope. Find the optimal $M$ , write out the algorithm, and compute the acceptance rate.

ExerciseCore

Problem

Prove that the acceptance rate of rejection sampling equals $\int f(x)dx / M$ when $f$ is unnormalized, or $1/M$ when $f$ is a proper density.

ExerciseAdvanced

Problem

Consider rejection sampling from $f(x) = e^{-x}$ on $[0, \infty)$ (standard Exponential) using proposal $g(x) = \frac{1}{(1+x)^2}$ on $[0, \infty)$ (a Pareto-like distribution). Find the optimal $M$ and the acceptance rate. Is this a good choice of proposal?

References

Canonical:

von Neumann (1951), "Various Techniques Used in Connection with Random Digits"
Gilks & Wild (1992), "Adaptive Rejection Sampling for Gibbs Sampling"

Current:

Robert & Casella, Monte Carlo Statistical Methods (2004), Chapter 2
Devroye, Non-Uniform Random Variate Generation (1986), Chapters 2-3
Gelman et al., Bayesian Data Analysis (2013), Chapters 10-12
Brooks et al., Handbook of MCMC (2011), Chapters 1-5

Next Topics

The natural next steps from rejection sampling:

Adaptive rejection sampling: automatic envelope construction for log-concave densities
Importance sampling: reweighting instead of rejecting, avoiding the need for a dominating envelope

Last reviewed: April 13, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

1

Monte Carlo Methodslayer 2 · tier 1

Derived topics

3

Importance Samplinglayer 2 · tier 1
Adaptive Rejection Samplinglayer 2 · tier 3
Squeezed Rejection Samplinglayer 2 · tier 3

Graph-backed continuations

Adaptive Rejection Sampling Importance Sampling Squeezed Rejection Sampling