Skip to main content

Statistical Estimation

De Moivre-Laplace Theorem

The first central limit theorem, historically. Bin(n,p) approximates N(np, np(1-p)) for large n, with explicit continuity correction. Stirling-based proof, Berry-Esseen rate, and where the approximation breaks down (small p, small n, skewed binomials).

ImportantCoreTier 2StableSupporting~30 min
For:StatsGeneral

Why This Matters

De Moivre proved this result in 1733 for p=1/2p = 1/2, and Laplace extended it to general pp in 1812. It is the central limit theorem in its first form, predating the modern statement by more than a century. For two hundred years it was the working tool that engineers, actuaries, and gamblers used to attach probabilities to deviations from expected counts.

The reason it still earns its own page on a site that already covers the general CLT: the binomial case is where students learn the continuity correction, the rule of thumb for "large enough nn", and the visual intuition that ties Pascal's triangle to the Gaussian bell curve. The general CLT abstracts all of this away. De Moivre-Laplace keeps it concrete.

It is also the place where Berry-Esseen rate constants are easiest to state and verify: the third moment of a centered Bernoulli is explicit, so the worst-case rate Cρ/(σ3n)C \rho / (\sigma^3 \sqrt{n}) becomes a closed form in pp.

Quick Version

ObjectApproximation
XBin(n,p)X \sim \mathrm{Bin}(n, p)XN(np,np(1p))X \approx \mathcal{N}(np, \, np(1-p))
StandardizedXnpnp(1p)dN(0,1)\frac{X - np}{\sqrt{np(1-p)}} \xrightarrow{d} \mathcal{N}(0, 1)
Pr[Xk]\Pr[X \leq k]Φ ⁣(k+0.5npnp(1p))\Phi\!\left(\frac{k + 0.5 - np}{\sqrt{np(1-p)}}\right) with continuity correction
Rule of thumbnp10np \geq 10 and n(1p)10n(1-p) \geq 10
Best ratesupxFn(x)Φ(x)C12p(1p)np(1p)\sup_x \lvert F_n(x) - \Phi(x)\rvert \leq C \frac{1 - 2p(1-p)}{\sqrt{n \, p(1-p)}}

The rule of thumb sets the regime where the approximation is good. The Berry-Esseen bound makes that quantitative.

Statement

Theorem

De Moivre-Laplace Theorem

Statement

Let XnBin(n,p)X_n \sim \mathrm{Bin}(n, p) with p(0,1)p \in (0, 1) fixed. Then the standardized count converges in distribution to a standard normal: Xnnpnp(1p)dN(0,1)as n.\frac{X_n - np}{\sqrt{np(1-p)}} \xrightarrow{d} \mathcal{N}(0, 1) \quad \text{as } n \to \infty. Equivalently, for every a<ba < b: Pr ⁣[aXnnpnp(1p)b]Φ(b)Φ(a).\Pr\!\left[a \leq \frac{X_n - np}{\sqrt{np(1-p)}} \leq b\right] \to \Phi(b) - \Phi(a).

Intuition

A binomial count is the sum of nn independent Bernoulli(pp) variables. Each contributes mean pp and variance p(1p)p(1-p). Standardizing the sum removes location and scale, and what is left has to converge to something universal. The Gaussian is the only stable law with the right symmetry and finite variance, so the limit is Gaussian.

Why It Matters

Before this result there was no machinery for attaching probabilities to "how far is 537 heads in 1000 fair flips from the expected 500". De Moivre gave the first quantitative answer, which is what every two-sample proportion test, every binomial confidence interval, and every poll margin-of-error ultimately uses. The general CLT comes later and is broader, but the binomial case is the one with the cleanest constants and the clearest visual story.

Failure Mode

The approximation degrades when the binomial is skewed: small npnp or small n(1p)n(1-p). In those regimes the Poisson limit (Bin(n, λ\lambda/n) \to Pois(λ\lambda); see Poisson limit theorem) gives a better approximation than the Normal. The third moment of a centered Bernoulli is p(1p)(12p)p(1-p)(1 - 2p), which vanishes at p=1/2p = 1/2 and maxes out near p{0,1}p \in \{0, 1\}. The Berry-Esseen constant grows as pp moves away from 1/21/2, which is the formal statement that "skewed binomials need more samples".

Continuity Correction

The binomial is discrete. The normal is continuous. The naive approximation Pr[Xk]Φ ⁣(knpnp(1p))\Pr[X \leq k] \approx \Phi\!\left(\frac{k - np}{\sqrt{np(1-p)}}\right) under-estimates the binomial CDF systematically because it cuts the discrete mass at kk in half. The continuity-corrected version moves the cut to k+0.5k + 0.5: Pr[Xk]Φ ⁣(k+0.5npnp(1p)).\Pr[X \leq k] \approx \Phi\!\left(\frac{k + 0.5 - np}{\sqrt{np(1-p)}}\right).

For a two-sided probability Pr[aXb]\Pr[a \leq X \leq b], the correction widens the interval by 0.50.5 on each side: Pr[aXb]Φ ⁣(b+0.5npnp(1p))Φ ⁣(a0.5npnp(1p)).\Pr[a \leq X \leq b] \approx \Phi\!\left(\frac{b + 0.5 - np}{\sqrt{np(1-p)}}\right) - \Phi\!\left(\frac{a - 0.5 - np}{\sqrt{np(1-p)}}\right).

The correction matters most at small nn or near the boundary. At n=30n = 30, p=0.5p = 0.5, the corrected approximation has roughly half the absolute error of the naive one. At n=104n = 10^4 the two are visually identical.

Example

Continuity correction in practice

A fair coin is flipped n=100n = 100 times. Estimate Pr[X55]\Pr[X \leq 55].

Mean np=50np = 50, variance np(1p)=25np(1-p) = 25, SD = 5.

Naive normal approximation: Pr[X55]Φ((5550)/5)=Φ(1)0.8413\Pr[X \leq 55] \approx \Phi((55 - 50)/5) = \Phi(1) \approx 0.8413.

With continuity correction: Pr[X55]Φ((55.550)/5)=Φ(1.1)0.8643\Pr[X \leq 55] \approx \Phi((55.5 - 50)/5) = \Phi(1.1) \approx 0.8643.

Exact binomial CDF: 0.86440.8644.

The correction recovers two decimal places of accuracy that the naive form throws away.

Proof Sketch (Standard Form)

The classical De Moivre-Laplace proof uses Stirling's approximation to the binomial coefficient and is the most direct path. The modern proof uses characteristic functions and is a one-page exercise in computing the limit of (1+ipt/np(1p)+)n(1 + i p t / \sqrt{np(1-p)} + \ldots)^n. Both are folded into the Advanced block below.

Optional ProofStirling-based proof of De Moivre-LaplaceShow

Write k=np+xnp(1p)k = np + x\sqrt{np(1-p)}. The binomial PMF is Pr[Xn=k]=(nk)pk(1p)nk.\Pr[X_n = k] = \binom{n}{k} p^k (1-p)^{n-k}.

Apply Stirling: logm!=mlogmm+12log(2πm)+O(1/m)\log m! = m \log m - m + \frac{1}{2}\log(2\pi m) + O(1/m). After substituting and expanding around k=npk = np:

log(nk)=nH(k/n)12log(2πnk/n(1k/n))+O(1/n)\log \binom{n}{k} = n H(k/n) - \frac{1}{2}\log(2\pi n \cdot k/n \cdot (1 - k/n)) + O(1/n)

where H(q)=qlogq(1q)log(1q)H(q) = -q \log q - (1-q)\log(1-q) is the binary entropy. Combine with log[pk(1p)nk]=klogp+(nk)log(1p)\log[p^k (1-p)^{n-k}] = k \log p + (n - k)\log(1-p). The leading nn-term is n[H(k/n)+(k/n)logp+(1k/n)log(1p)]n[H(k/n) + (k/n) \log p + (1 - k/n)\log(1-p)]. This is the negative KL divergence nD(k/np)-n \, D(k/n \,\|\, p).

Expand D(qp)D(q \,\|\, p) to second order around q=pq = p: D(qp)=(qp)22p(1p)+O((qp)3)D(q\,\|\,p) = \frac{(q-p)^2}{2 p(1-p)} + O((q-p)^3).

Substituting q=k/n=p+x/np(1p)/nq = k/n = p + x/\sqrt{n}\cdot \sqrt{p(1-p)/n} and simplifying:

Pr[Xn=k]12πnp(1p)exp ⁣(x22).\Pr[X_n = k] \approx \frac{1}{\sqrt{2\pi n p(1-p)}} \exp\!\left(-\frac{x^2}{2}\right).

Summing the local approximation over a window in kk corresponding to x[a,b]x \in [a, b] gives the integrated form Pr[a(Xnnp)/np(1p)b]Φ(b)Φ(a)\Pr[a \leq (X_n - np)/\sqrt{np(1-p)} \leq b] \to \Phi(b) - \Phi(a).

Optional ProofCharacteristic-function proofShow

Let Zn=(Xnnp)/np(1p)Z_n = (X_n - np)/\sqrt{np(1-p)}. Write Xn=i=1nYiX_n = \sum_{i=1}^n Y_i for i.i.d. Bernoulli(pp) variables, and let Y~i=(Yip)/p(1p)\tilde Y_i = (Y_i - p)/\sqrt{p(1-p)}, so Y~i\tilde Y_i has mean 00 and variance 11. The characteristic function of Y~i\tilde Y_i is

φY~(t)=peit(1p)/p(1p)+(1p)eitp/p(1p).\varphi_{\tilde Y}(t) = p \cdot e^{i t (1-p)/\sqrt{p(1-p)}} + (1-p) \cdot e^{-i t p/\sqrt{p(1-p)}}.

Expanding to second order: φY~(t)=1t2/2+O(t3)\varphi_{\tilde Y}(t) = 1 - t^2/2 + O(t^3).

Then φZn(t)=φY~(t/n)n=(1t22n+O(n3/2))net2/2\varphi_{Z_n}(t) = \varphi_{\tilde Y}(t/\sqrt{n})^n = \left(1 - \frac{t^2}{2n} + O(n^{-3/2})\right)^n \to e^{-t^2/2}.

This is the characteristic function of N(0,1)\mathcal{N}(0, 1). By Lévy's continuity theorem, ZnZ_n converges in distribution to standard normal.

Quantitative BoundBerry-Esseen rate for the binomialShow

The third absolute central moment of a Bernoulli(pp) variable is ρ=p(1p)12p+p(1p)(2p22p+1)1{terms}\rho = p(1-p)\lvert 1 - 2p\rvert + p(1-p)(2p^2 - 2p + 1) \cdot \mathbf{1}\{\text{terms}\}, which simplifies in the centered case to a function of pp vanishing at p=1/2p = 1/2. After standardization, the Berry-Esseen constant gives

supxFn(x)Φ(x)    Cρσ3n  =  C(12p(1p))np(1p)\sup_x \lvert F_n(x) - \Phi(x)\rvert \;\leq\; \frac{C \, \rho}{\sigma^3 \sqrt{n}} \;=\; \frac{C (1 - 2p(1-p))}{\sqrt{n \, p(1-p)}}

where σ2=p(1p)\sigma^2 = p(1-p) and C0.4748C \leq 0.4748 (Shevtsova 2011). The bound diverges as p0p \to 0 or p1p \to 1, formalizing the rule of thumb that the binomial is hardest to approximate when one tail is rare.

For p=1/2p = 1/2, n=100n = 100: bound is 0.4748/(1000.25)=0.095\leq 0.4748 / (\sqrt{100 \cdot 0.25}) = 0.095. The true sup is much smaller, around 0.0040.004, but the Berry-Esseen bound is universal and free of additional moment assumptions.

When to Use the Normal Approximation

RegimeBetter approximation
np10np \geq 10, n(1p)10n(1-p) \geq 10Normal with continuity correction
npnp small, n(1p)n(1-p) large (rare events)Poisson with λ=np\lambda = np
npnp moderate, very small pp, very large nnBoth Normal and Poisson are reasonable; Poisson is simpler
Both npnp and n(1p)n(1-p) smallUse the exact binomial PMF

The Poisson alternative is treated on its own page: Poisson limit theorem.

Common Confusions

Watch Out

The approximation is not a tail bound

De Moivre-Laplace gives an approximation to Pr[Xk]\Pr[X \leq k] for typical kk. It is not a tail bound. For very large deviations (kk many standard deviations from npnp), the Gaussian tails decay faster than the binomial tails, so the approximation under-estimates probabilities in the deep tail. Use Chernoff or Hoeffding for rigorous tail bounds, not the Normal approximation.

Watch Out

Continuity correction is not optional decoration

Skipping the +0.5+ 0.5 at n=100n = 100, p=0.5p = 0.5 costs roughly two decimal places of accuracy and reverses the direction of the bias. The corrected form is the actual second-order CLT approximation; the uncorrected form is a first-order approximation that throws away information you already have. The correction is mechanical and adds zero cost.

Watch Out

Bin(n,p) is not approximately Normal when p is near 0 or 1

The rule of thumb np10np \geq 10, n(1p)10n(1-p) \geq 10 is not arbitrary. When pp is small relative to 1/n1/n, the binomial is skewed and the Poisson approximation dominates. The Normal is symmetric; the binomial only becomes symmetric for p1/2p \approx 1/2. Forcing a Normal approximation on a skewed binomial systematically miscalibrates one-sided tail probabilities.

Exercises

ExerciseCore

Problem

A factory produces components with defect rate p=0.02p = 0.02. In a batch of n=1000n = 1000, what is the probability of at most 2525 defects? Compute using (a) Normal approximation without continuity correction, (b) Normal with continuity correction, (c) Poisson approximation with λ=20\lambda = 20. Compare to the exact value Pr[Bin(1000,0.02)25]0.8881\Pr[\mathrm{Bin}(1000, 0.02) \leq 25] \approx 0.8881.

ExerciseAdvanced

Problem

Show that for Bernoulli(pp) variables, the Berry-Esseen ratio ρ/σ3\rho/\sigma^3 equals (12p(1p))/p(1p)(1 - 2p(1-p))/\sqrt{p(1-p)}. Hence determine the value of pp that minimizes this ratio.

References

Canonical:

  • Feller, An Introduction to Probability Theory and Its Applications, Vol I (3rd ed., 1968), Chapter VII (De Moivre-Laplace via Stirling) and Chapter VIII (Berry-Esseen rate).
  • Blitzstein and Hwang, Introduction to Probability (2nd ed., 2019), Chapter 10 (CLT and normal approximation, with continuity correction worked).
  • Billingsley, Probability and Measure (3rd ed., 1995), Section 27 (modern proof via characteristic functions).

Current:

  • Tijms, Understanding Probability (3rd ed., 2012), Chapter 5 (continuity correction with applied examples).
  • Shevtsova, "On the absolute constants in the Berry-Esseen inequality for i.i.d. summands" (2011), arXiv:1111.6554, sharpens the universal constant to C0.4748C \leq 0.4748.

Next Topics

Last reviewed: May 12, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

3

Derived topics

0

No published topic currently declares this as a prerequisite.