Skip to main content

Statistical Estimation

Poisson Limit Theorem and Le Cam's Bound

Bin(n, lambda/n) converges to Pois(lambda) as n grows. The classical product-of-PMFs proof, then Le Cam's total-variation bound that makes the approximation quantitative. When to use Poisson vs Normal approximation. Disambiguation: Le Cam published multiple famous theorems.

ImportantCoreTier 2StableSupporting~30 min
For:StatsActuarial

Why This Matters

The Poisson distribution is what you get when many independent things each have a small chance of happening, and you count how many actually do. Rare typos in a long document, photons hitting a detector in a fixed time window, insurance claims in a year, defects on a manufactured wafer, mutations in a stretch of DNA, requests arriving at a web server. All of these have the same mathematical structure: a sum of many Bernoulli trials with small success probabilities.

The Poisson limit theorem makes the connection precise. As the number of trials nn grows and the per-trial probability pp shrinks so that the expected count npnp stays finite, the binomial distribution converges to Poisson. The result is older than the modern CLT and predates the formal probability axioms; Poisson published the special case in 1837.

Le Cam's 1960 sharpening gives a quantitative version: the total-variation distance between Bin(n,p)\mathrm{Bin}(n, p) and Pois(np)\mathrm{Pois}(np) is at most np2n p^2. This is sharper than the standard Berry-Esseen rate for the normal approximation in the rare-event regime and explains why practitioners reach for Poisson rather than Normal when pp is small.

Quick Version

ObjectApproximation
XBin(n,p)X \sim \mathrm{Bin}(n, p) with np=λnp = \lambda fixed, nn \to \inftyXdPois(λ)X \xrightarrow{d} \mathrm{Pois}(\lambda)
Finite-nn error (Le Cam, 1960)Bin(n,p)Pois(np)TVnp2\lVert \mathrm{Bin}(n, p) - \mathrm{Pois}(np)\rVert_{\mathrm{TV}} \leq n p^2
Sum of non-identical BernoullisBin(p1,,pn)Pois(pi)TVpi2\lVert \mathrm{Bin}(p_1, \ldots, p_n) - \mathrm{Pois}(\sum p_i)\rVert_{\mathrm{TV}} \leq \sum p_i^2
Rule of thumbn100n \geq 100, p0.05p \leq 0.05

The non-identical case is the form most worth remembering, since it covers Bernoulli trials with different success probabilities (different risk classes, different exposure levels). Le Cam's bound is the cleanest finite-sample approximation result in elementary probability.

Statement

Theorem

Poisson Limit Theorem

Statement

Let XnBin(n,pn)X_n \sim \mathrm{Bin}(n, p_n) with npnλ>0n p_n \to \lambda > 0 as nn \to \infty. Then XnX_n converges in distribution to Pois(λ)\mathrm{Pois}(\lambda): Pr[Xn=k]eλλkk!,k=0,1,2,\Pr[X_n = k] \to e^{-\lambda} \frac{\lambda^k}{k!}, \quad k = 0, 1, 2, \ldots The convergence is also in total-variation distance, not only weakly.

Intuition

The Poisson distribution is the law of rare events: many independent chances, each one tiny, with a finite expected count. The binomial PMF (nk)pk(1p)nk\binom{n}{k} p^k (1-p)^{n-k} pulls toward eλλk/k!e^{-\lambda} \lambda^k / k! when pp is small and nn is large, because (1p)neλ(1-p)^n \to e^{-\lambda} and (nk)pkλk/k!\binom{n}{k} p^k \to \lambda^k / k!. The Poisson PMF emerges from the product.

Why It Matters

The limit explains why the same Poisson distribution appears in radioactive decay, queue arrivals, mutation counts, and insurance claims. None of these systems involve an integer parameter nn of trials in any visible sense, yet they all sit at the Poisson endpoint of the binomial family. The Poisson is the universal limit law for sums of rare independent events.

Practically, the result lets you replace (nk)pk(1p)nk\binom{n}{k} p^k (1-p)^{n-k} with enp(np)k/k!e^{-np}(np)^k / k!, which has no nn dependence inside the combinatorial coefficient. For n=106n = 10^6 and p=105p = 10^{-5}, the binomial is computationally inconvenient; the Poisson approximation is one line.

Failure Mode

The approximation degrades when pp is not small. At p=0.1p = 0.1 the Le Cam bound gives np2=0.1nn p^2 = 0.1 n, which grows with nn rather than shrinking. For moderate pp the Normal approximation (De Moivre-Laplace) is the right tool instead. The Poisson limit also fails when the underlying trials are dependent: a sum of dependent Bernoullis with the same marginals can have a different limit law, and quantifying the dependence requires the Stein-Chen method or a coupling argument.

Optional ProofClassical product-of-PMFs proofShow

Fix kk and let pn=λ/np_n = \lambda/n, so npn=λn p_n = \lambda exactly. The binomial PMF is Pr[Xn=k]=(nk)pnk(1pn)nk=n!k!(nk)!(λn)k(1λn)nk.\Pr[X_n = k] = \binom{n}{k} p_n^k (1 - p_n)^{n - k} = \frac{n!}{k!(n-k)!} \left(\frac{\lambda}{n}\right)^k \left(1 - \frac{\lambda}{n}\right)^{n - k}.

Group the factors: Pr[Xn=k]=λkk!n(n1)(nk+1)nk(A)(1λn)n(B)(1λn)k(C).\Pr[X_n = k] = \frac{\lambda^k}{k!} \cdot \underbrace{\frac{n(n-1)\cdots(n - k + 1)}{n^k}}_{\text{(A)}} \cdot \underbrace{\left(1 - \frac{\lambda}{n}\right)^{n}}_{\text{(B)}} \cdot \underbrace{\left(1 - \frac{\lambda}{n}\right)^{-k}}_{\text{(C)}}.

Factor (A): a product of kk terms each of the form (nj)/n=1j/n(n - j)/n = 1 - j/n, all converging to 11. So (A) 1\to 1.

Factor (B): a defining limit, (1λ/n)neλ(1 - \lambda/n)^n \to e^{-\lambda}.

Factor (C): converges to 11 since kk is fixed and λ/n0\lambda/n \to 0.

Multiplying: Pr[Xn=k](λk/k!)eλ\Pr[X_n = k] \to (\lambda^k / k!) \cdot e^{-\lambda}, which is the Poisson PMF. The same argument with npnλn p_n \to \lambda (not exactly λ/n\lambda/n) goes through with negligible adjustments because pnnλp_n n \to \lambda controls the rate of all three factors.

Le Cam's Total-Variation Bound

The convergence statement above is qualitative. Le Cam (1960) proved a quantitative form that bounds the approximation error at finite nn.

Theorem

Le Cam Total-Variation Bound

Statement

Let X1,,XnX_1, \ldots, X_n be independent with XiBern(pi)X_i \sim \mathrm{Bern}(p_i), and let S=i=1nXiS = \sum_{i=1}^n X_i. Then for YPois ⁣(i=1npi)Y \sim \mathrm{Pois}\!\left(\sum_{i=1}^n p_i\right), Law(S)Law(Y)TV    i=1npi2.\lVert \mathrm{Law}(S) - \mathrm{Law}(Y)\rVert_{\mathrm{TV}} \;\leq\; \sum_{i=1}^n p_i^2. In the homogeneous case pi=pp_i = p for all ii, this specializes to Bin(n,p)Pois(np)TVnp2\lVert \mathrm{Bin}(n, p) - \mathrm{Pois}(np)\rVert_{\mathrm{TV}} \leq n p^2.

Intuition

The total-variation distance between two distributions PP and QQ is supAP(A)Q(A)\sup_A \lvert P(A) - Q(A)\rvert, the worst-case difference in probabilities across all events. Le Cam's bound says the binomial and Poisson assign nearly the same probability to every event when the success probabilities pip_i are all small, with explicit error proportional to the sum of squared probabilities. Squaring is the right scaling because the first-moment match E[S]=E[Y]\mathbb{E}[S] = \mathbb{E}[Y] is exact; the error is driven by second-moment mismatch.

Why It Matters

The bound is non-asymptotic: it holds at finite nn and gives an explicit error. This matters for insurance and reliability applications where nn is concrete and pp varies across risk classes. It also matters in theoretical computer science (sums of rare events in randomized algorithms) and in epidemiology (counting cases across heterogeneous populations). The classical limit theorem says "convergence happens"; Le Cam's bound says "and here is how close you already are at n=50n = 50".

Failure Mode

The bound is tight in the rare-event regime pi2\sum p_i^2 small. When pip_i are not small the bound is useless: at p=0.5p = 0.5, n=10n = 10, the bound gives 2.52.5, far above the maximum possible total variation of 11. The useful regime is roughly pi20.1\sum p_i^2 \leq 0.1, where the bound certifies non-trivial approximation. The bound also requires independence; with dependent Bernoullis the right tool is the Chen-Stein method, which extends the bound by adding a coupling-error term.

Optional ProofCoupling proof of Le Cam's boundShow

The slickest proof constructs XiX_i and YiY_i on the same probability space so they agree as often as possible.

For each ii, build a coupling (Xi,Yi)(X_i, Y_i) with XiBern(pi)X_i \sim \mathrm{Bern}(p_i) and YiPois(pi)Y_i \sim \mathrm{Pois}(p_i) such that Pr[XiYi]=pi(1epi)1pi2/2+O(pi3)\Pr[X_i \neq Y_i] = p_i - (1 - e^{-p_i}) \cdot 1 \leq p_i^2 / 2 + O(p_i^3) (the geometric inequality 1epp1 - e^{-p} \leq p gives the bound after a case-by-case construction; the cleanest version yields exactly pi2p_i^2 as the per-trial bound after simplification, see Lindvall §1).

Let S=XiS = \sum X_i and T=YiT = \sum Y_i. The Poisson is closed under convolution of independents, so TPois(pi)T \sim \mathrm{Pois}(\sum p_i). By the coupling inequality: Law(S)Law(T)TV    Pr[ST]    iPr[XiYi]    ipi2.\lVert \mathrm{Law}(S) - \mathrm{Law}(T)\rVert_{\mathrm{TV}} \;\leq\; \Pr[S \neq T] \;\leq\; \sum_i \Pr[X_i \neq Y_i] \;\leq\; \sum_i p_i^2.

The first inequality is the standard coupling inequality (the TV distance is the infimum over couplings of Pr[ST]\Pr[S \neq T]). The second is a union bound. The third is the per-trial coupling estimate. This is one of the classical applications of the Stein-Chen method in its coupling form; see Barbour, Holst, and Janson (1992) for the full development.

Le Cam Theorem Disambiguation

Lucien Le Cam (1924-2000) published several theorems that bear his name, and conflating them is a real source of confusion in the literature. The result on this page is the Poisson approximation theorem (Le Cam, 1960). The other major Le Cam results worth knowing about:

  • Le Cam's first lemma and third lemma on contiguity of probability measures (Le Cam, 1960). Used in asymptotic statistics for local asymptotic normality (LAN). Unrelated to Poisson approximation.
  • Le Cam's bound on minimax risk (Le Cam, 1973). A general technique for lower-bounding statistical estimation risk via two-point or multi-point reductions. Unrelated to either of the above.
  • Le Cam's theorem on quadratic mean differentiability in asymptotic statistics. Used in establishing LAN and efficient-estimator theory.

When a textbook says "Le Cam's theorem" without context, check whether the statement involves total-variation distance (this page), local likelihood ratios (contiguity), or estimation lower bounds (minimax). Three different beasts.

When to Use Poisson vs Normal Approximation

The general rule: use Poisson when pp is small and npnp is moderate (rare events). Use Normal (De Moivre-Laplace) when npnp is large and pp is not extreme. Both approximations agree in the intermediate regime np30np \gtrsim 30, p[0.1,0.9]p \in [0.1, 0.9].

RegimePoisson error (Le Cam)Normal error (Berry-Esseen)Recommended
n=100n = 100, p=0.01p = 0.01np2=0.01n p^2 = 0.010.4/1=0.4\sim 0.4 / \sqrt{1} = 0.4Poisson
n=1000n = 1000, p=0.02p = 0.020.40.40.4/200.09\sim 0.4/\sqrt{20} \approx 0.09Normal (with continuity correction)
n=100n = 100, p=0.5p = 0.52525 (useless)0.05\sim 0.05Normal
n=10n = 10, p=0.5p = 0.52.52.5 (useless)0.15\sim 0.15Exact binomial

The Le Cam bound gives a finite-nn certificate. The Berry-Esseen bound likewise gives a finite-nn certificate for the Normal approximation. Comparing the two tells you which approximation is currently better.

Common Confusions

Watch Out

Poisson is not Normal with small mean

Pois(λ\lambda) is not the same as N(λ,λ)\mathcal{N}(\lambda, \lambda) even though they share the same mean and variance. Poisson is supported on non-negative integers and is right-skewed for small λ\lambda. At λ=2\lambda = 2, the Poisson skewness is 1/20.711/\sqrt{2} \approx 0.71, far from Gaussian. For λ30\lambda \geq 30 the Poisson is close enough to Gaussian that you can use Normal approximation on a Poisson, but for small λ\lambda you cannot.

Watch Out

The Poisson process is not the Poisson distribution

Poisson distribution: the count of rare events in a fixed-size window. Poisson process: a random collection of points in time (or space) with the property that counts on disjoint windows are independent Poissons. The distribution is one marginal of the process. Confusing the two leads to errors when independence across windows matters (e.g., reasoning about inter-arrival times).

Watch Out

Le Cam's bound requires independence

The Le Cam pi2\sum p_i^2 bound is for independent Bernoulli summands. For dependent Bernoullis the bound fails: nn perfectly correlated Bernoullis sum to either 00 or nn, never anywhere in between, and no Poisson approximation makes sense. The Stein-Chen method handles weakly dependent variables with an extra term that measures the dependence.

Exercises

ExerciseCore

Problem

A web server receives n=106n = 10^6 requests per day, each one of which triggers a rare bug with probability p=5×106p = 5 \times 10^{-6} independently. Approximate the probability that the bug is triggered at least 3 times, and bound the error of the approximation using Le Cam's inequality.

ExerciseAdvanced

Problem

For a sum of n=100n = 100 independent Bernoullis with probabilities pi=i/5000p_i = i / 5000 for i=1,,100i = 1, \ldots, 100, compute the Le Cam total-variation upper bound on the distance to the matching Poisson. Compare to the homogeneous case pi=pˉp_i = \bar p with pˉ\bar p the average.

References

Canonical:

  • Feller, An Introduction to Probability Theory and Its Applications, Vol I (3rd ed., 1968), Chapter VI (Poisson limit and the law of small numbers).
  • Barbour, Holst, and Janson, Poisson Approximation (1992), Chapter 1 (the Stein-Chen method, modern derivation of Le Cam's bound).
  • Le Cam, "An approximation theorem for the Poisson binomial distribution", Pacific Journal of Mathematics 10 (1960), pp. 1181-1197.

Current:

  • Blitzstein and Hwang, Introduction to Probability (2nd ed., 2019), Chapter 4 (Poisson distribution and Poisson approximation, applied perspective).
  • Lindvall, Lectures on the Coupling Method (1992; Dover ed. 2002), Section 1 (coupling proof of the Le Cam bound).

Next Topics

Last reviewed: May 12, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

3

Derived topics

0

No published topic currently declares this as a prerequisite.