Poisson Limit Theorem and Le Cam's Bound

Sneiderman, Robby

Statistical Estimation

Poisson Limit Theorem and Le Cam's Bound

Bin(n, lambda/n) converges to Pois(lambda) as n grows. The classical product-of-PMFs proof, then Le Cam's total-variation bound that makes the approximation quantitative. When to use Poisson vs Normal approximation. Disambiguation: Le Cam published multiple famous theorems.

ImportantCoreTier 2StableSupporting~30 min

For:StatsActuarial

Prerequisites

Common Probability Distributions Characteristic Functions Moment Generating Functions

Prereq Map

Why This Matters

The Poisson distribution is what you get when many independent things each have a small chance of happening, and you count how many actually do. Rare typos in a long document, photons hitting a detector in a fixed time window, insurance claims in a year, defects on a manufactured wafer, mutations in a stretch of DNA, requests arriving at a web server. All of these have the same mathematical structure: a sum of many Bernoulli trials with small success probabilities.

The Poisson limit theorem makes the connection precise. As the number of trials $n$ grows and the per-trial probability $p$ shrinks so that the expected count $np$ stays finite, the binomial distribution converges to Poisson. The result is older than the modern CLT and predates the formal probability axioms; Poisson published the special case in 1837.

Le Cam's 1960 sharpening gives a quantitative version: the total-variation distance between $\mathrm{Bin}(n, p)$ and $\mathrm{Pois}(np)$ is at most $n p^2$ . This is sharper than the standard Berry-Esseen rate for the normal approximation in the rare-event regime and explains why practitioners reach for Poisson rather than Normal when $p$ is small.

Quick Version

Object	Approximation
$X \sim \mathrm{Bin}(n, p)$ with $np = \lambda$ fixed, $n \to \infty$	$X \xrightarrow{d} \mathrm{Pois}(\lambda)$
Finite- $n$ error (Le Cam, 1960)	$\lVert \mathrm{Bin}(n, p) - \mathrm{Pois}(np)\rVert_{\mathrm{TV}} \leq n p^2$
Sum of non-identical Bernoullis	$\lVert \mathrm{Bin}(p_1, \ldots, p_n) - \mathrm{Pois}(\sum p_i)\rVert_{\mathrm{TV}} \leq \sum p_i^2$
Rule of thumb	$n \geq 100$ , $p \leq 0.05$

The non-identical case is the form most worth remembering, since it covers Bernoulli trials with different success probabilities (different risk classes, different exposure levels). Le Cam's bound is the cleanest finite-sample approximation result in elementary probability.

Statement

Theorem

Poisson Limit Theorem

Statement

Let $X_n \sim \mathrm{Bin}(n, p_n)$ with $n p_n \to \lambda > 0$ as $n \to \infty$ . Then $X_n$ converges in distribution to $\mathrm{Pois}(\lambda)$ : $\Pr[X_n = k] \to e^{-\lambda} \frac{\lambda^k}{k!}, \quad k = 0, 1, 2, \ldots$ The convergence is also in total-variation distance, not only weakly.

Intuition

The Poisson distribution is the law of rare events: many independent chances, each one tiny, with a finite expected count. The binomial PMF $\binom{n}{k} p^k (1-p)^{n-k}$ pulls toward $e^{-\lambda} \lambda^k / k!$ when $p$ is small and $n$ is large, because $(1-p)^n \to e^{-\lambda}$ and $\binom{n}{k} p^k \to \lambda^k / k!$ . The Poisson PMF emerges from the product.

Why It Matters

The limit explains why the same Poisson distribution appears in radioactive decay, queue arrivals, mutation counts, and insurance claims. None of these systems involve an integer parameter $n$ of trials in any visible sense, yet they all sit at the Poisson endpoint of the binomial family. The Poisson is the universal limit law for sums of rare independent events.

Practically, the result lets you replace $\binom{n}{k} p^k (1-p)^{n-k}$ with $e^{-np}(np)^k / k!$ , which has no $n$ dependence inside the combinatorial coefficient. For $n = 10^6$ and $p = 10^{-5}$ , the binomial is computationally inconvenient; the Poisson approximation is one line.

Failure Mode

The approximation degrades when $p$ is not small. At $p = 0.1$ the Le Cam bound gives $n p^2 = 0.1 n$ , which grows with $n$ rather than shrinking. For moderate $p$ the Normal approximation (De Moivre-Laplace) is the right tool instead. The Poisson limit also fails when the underlying trials are dependent: a sum of dependent Bernoullis with the same marginals can have a different limit law, and quantifying the dependence requires the Stein-Chen method or a coupling argument.

report a correction →

Optional ProofClassical product-of-PMFs proofShow

Fix $k$ and let $p_n = \lambda/n$ , so $n p_n = \lambda$ exactly. The binomial PMF is $\Pr[X_n = k] = \binom{n}{k} p_n^k (1 - p_n)^{n - k} = \frac{n!}{k!(n-k)!} \left(\frac{\lambda}{n}\right)^k \left(1 - \frac{\lambda}{n}\right)^{n - k}.$

Group the factors: $\Pr[X_n = k] = \frac{\lambda^k}{k!} \cdot \underbrace{\frac{n(n-1)\cdots(n - k + 1)}{n^k}}_{\text{(A)}} \cdot \underbrace{\left(1 - \frac{\lambda}{n}\right)^{n}}_{\text{(B)}} \cdot \underbrace{\left(1 - \frac{\lambda}{n}\right)^{-k}}_{\text{(C)}}.$

Factor (A): a product of $k$ terms each of the form $(n - j)/n = 1 - j/n$ , all converging to $1$ . So (A) $\to 1$ .

Factor (B): a defining limit, $(1 - \lambda/n)^n \to e^{-\lambda}$ .

Factor (C): converges to $1$ since $k$ is fixed and $\lambda/n \to 0$ .

Multiplying: $\Pr[X_n = k] \to (\lambda^k / k!) \cdot e^{-\lambda}$ , which is the Poisson PMF. The same argument with $n p_n \to \lambda$ (not exactly $\lambda/n$ ) goes through with negligible adjustments because $p_n n \to \lambda$ controls the rate of all three factors.

Le Cam's Total-Variation Bound

The convergence statement above is qualitative. Le Cam (1960) proved a quantitative form that bounds the approximation error at finite $n$ .

Theorem

Le Cam Total-Variation Bound

Statement

Let $X_1, \ldots, X_n$ be independent with $X_i \sim \mathrm{Bern}(p_i)$ , and let $S = \sum_{i=1}^n X_i$ . Then for $Y \sim \mathrm{Pois}\!\left(\sum_{i=1}^n p_i\right)$ , $\lVert \mathrm{Law}(S) - \mathrm{Law}(Y)\rVert_{\mathrm{TV}} \;\leq\; \sum_{i=1}^n p_i^2.$ In the homogeneous case $p_i = p$ for all $i$ , this specializes to $\lVert \mathrm{Bin}(n, p) - \mathrm{Pois}(np)\rVert_{\mathrm{TV}} \leq n p^2$ .

Intuition

The total-variation distance between two distributions $P$ and $Q$ is $\sup_A \lvert P(A) - Q(A)\rvert$ , the worst-case difference in probabilities across all events. Le Cam's bound says the binomial and Poisson assign nearly the same probability to every event when the success probabilities $p_i$ are all small, with explicit error proportional to the sum of squared probabilities. Squaring is the right scaling because the first-moment match $\mathbb{E}[S] = \mathbb{E}[Y]$ is exact; the error is driven by second-moment mismatch.

Why It Matters

The bound is non-asymptotic: it holds at finite $n$ and gives an explicit error. This matters for insurance and reliability applications where $n$ is concrete and $p$ varies across risk classes. It also matters in theoretical computer science (sums of rare events in randomized algorithms) and in epidemiology (counting cases across heterogeneous populations). The classical limit theorem says "convergence happens"; Le Cam's bound says "and here is how close you already are at $n = 50$ ".

Failure Mode

The bound is tight in the rare-event regime $\sum p_i^2$ small. When $p_i$ are not small the bound is useless: at $p = 0.5$ , $n = 10$ , the bound gives $2.5$ , far above the maximum possible total variation of $1$ . The useful regime is roughly $\sum p_i^2 \leq 0.1$ , where the bound certifies non-trivial approximation. The bound also requires independence; with dependent Bernoullis the right tool is the Chen-Stein method, which extends the bound by adding a coupling-error term.

report a correction →

Optional ProofCoupling proof of Le Cam's boundShow

The slickest proof constructs $X_i$ and $Y_i$ on the same probability space so they agree as often as possible.

For each $i$ , build a coupling $(X_i, Y_i)$ with $X_i \sim \mathrm{Bern}(p_i)$ and $Y_i \sim \mathrm{Pois}(p_i)$ such that $\Pr[X_i \neq Y_i] = p_i - (1 - e^{-p_i}) \cdot 1 \leq p_i^2 / 2 + O(p_i^3)$ (the geometric inequality $1 - e^{-p} \leq p$ gives the bound after a case-by-case construction; the cleanest version yields exactly $p_i^2$ as the per-trial bound after simplification, see Lindvall §1).

Let $S = \sum X_i$ and $T = \sum Y_i$ . The Poisson is closed under convolution of independents, so $T \sim \mathrm{Pois}(\sum p_i)$ . By the coupling inequality: $\lVert \mathrm{Law}(S) - \mathrm{Law}(T)\rVert_{\mathrm{TV}} \;\leq\; \Pr[S \neq T] \;\leq\; \sum_i \Pr[X_i \neq Y_i] \;\leq\; \sum_i p_i^2.$

The first inequality is the standard coupling inequality (the TV distance is the infimum over couplings of $\Pr[S \neq T]$ ). The second is a union bound. The third is the per-trial coupling estimate. This is one of the classical applications of the Stein-Chen method in its coupling form; see Barbour, Holst, and Janson (1992) for the full development.

Le Cam Theorem Disambiguation

Lucien Le Cam (1924-2000) published several theorems that bear his name, and conflating them is a real source of confusion in the literature. The result on this page is the Poisson approximation theorem (Le Cam, 1960). The other major Le Cam results worth knowing about:

Le Cam's first lemma and third lemma on contiguity of probability measures (Le Cam, 1960). Used in asymptotic statistics for local asymptotic normality (LAN). Unrelated to Poisson approximation.
Le Cam's bound on minimax risk (Le Cam, 1973). A general technique for lower-bounding statistical estimation risk via two-point or multi-point reductions. Unrelated to either of the above.
Le Cam's theorem on quadratic mean differentiability in asymptotic statistics. Used in establishing LAN and efficient-estimator theory.

When a textbook says "Le Cam's theorem" without context, check whether the statement involves total-variation distance (this page), local likelihood ratios (contiguity), or estimation lower bounds (minimax). Three different beasts.

When to Use Poisson vs Normal Approximation

The general rule: use Poisson when $p$ is small and $np$ is moderate (rare events). Use Normal (De Moivre-Laplace) when $np$ is large and $p$ is not extreme. Both approximations agree in the intermediate regime $np \gtrsim 30$ , $p \in [0.1, 0.9]$ .

Regime	Poisson error (Le Cam)	Normal error (Berry-Esseen)	Recommended
$n = 100$ , $p = 0.01$	$n p^2 = 0.01$	$\sim 0.4 / \sqrt{1} = 0.4$	Poisson
$n = 1000$ , $p = 0.02$	$0.4$	$\sim 0.4/\sqrt{20} \approx 0.09$	Normal (with continuity correction)
$n = 100$ , $p = 0.5$	$25$ (useless)	$\sim 0.05$	Normal
$n = 10$ , $p = 0.5$	$2.5$ (useless)	$\sim 0.15$	Exact binomial

The Le Cam bound gives a finite- $n$ certificate. The Berry-Esseen bound likewise gives a finite- $n$ certificate for the Normal approximation. Comparing the two tells you which approximation is currently better.

Common Confusions

Watch Out

Poisson is not Normal with small mean

Pois( $\lambda$ ) is not the same as $\mathcal{N}(\lambda, \lambda)$ even though they share the same mean and variance. Poisson is supported on non-negative integers and is right-skewed for small $\lambda$ . At $\lambda = 2$ , the Poisson skewness is $1/\sqrt{2} \approx 0.71$ , far from Gaussian. For $\lambda \geq 30$ the Poisson is close enough to Gaussian that you can use Normal approximation on a Poisson, but for small $\lambda$ you cannot.

Watch Out

The Poisson process is not the Poisson distribution

Poisson distribution: the count of rare events in a fixed-size window. Poisson process: a random collection of points in time (or space) with the property that counts on disjoint windows are independent Poissons. The distribution is one marginal of the process. Confusing the two leads to errors when independence across windows matters (e.g., reasoning about inter-arrival times).

Watch Out

Le Cam's bound requires independence

The Le Cam $\sum p_i^2$ bound is for independent Bernoulli summands. For dependent Bernoullis the bound fails: $n$ perfectly correlated Bernoullis sum to either $0$ or $n$ , never anywhere in between, and no Poisson approximation makes sense. The Stein-Chen method handles weakly dependent variables with an extra term that measures the dependence.

Exercises

ExerciseCore

Problem

A web server receives $n = 10^6$ requests per day, each one of which triggers a rare bug with probability $p = 5 \times 10^{-6}$ independently. Approximate the probability that the bug is triggered at least 3 times, and bound the error of the approximation using Le Cam's inequality.

ExerciseAdvanced

Problem

For a sum of $n = 100$ independent Bernoullis with probabilities $p_i = i / 5000$ for $i = 1, \ldots, 100$ , compute the Le Cam total-variation upper bound on the distance to the matching Poisson. Compare to the homogeneous case $p_i = \bar p$ with $\bar p$ the average.

References

Canonical:

Feller, An Introduction to Probability Theory and Its Applications, Vol I (3rd ed., 1968), Chapter VI (Poisson limit and the law of small numbers).
Barbour, Holst, and Janson, Poisson Approximation (1992), Chapter 1 (the Stein-Chen method, modern derivation of Le Cam's bound).
Le Cam, "An approximation theorem for the Poisson binomial distribution", Pacific Journal of Mathematics 10 (1960), pp. 1181-1197.

Current:

Blitzstein and Hwang, Introduction to Probability (2nd ed., 2019), Chapter 4 (Poisson distribution and Poisson approximation, applied perspective).
Lindvall, Lectures on the Coupling Method (1992; Dover ed. 2002), Section 1 (coupling proof of the Le Cam bound).

Next Topics

De Moivre-Laplace Theorem — the other limit law for the binomial, in the complementary regime ( $p$ moderate).
Central Limit Theorem — the universal sum-of-independent-summands result.
Characteristic Functions — the standard tool for proving convergence-in-distribution results.

Last reviewed: May 12, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

3

Common Probability Distributionslayer 0A · tier 1
Characteristic Functionslayer 1 · tier 1
Moment Generating Functionslayer 0A · tier 2

Derived topics

0

No published topic currently declares this as a prerequisite.