Statistical Estimation
De Moivre-Laplace Theorem
The first central limit theorem, historically. Bin(n,p) approximates N(np, np(1-p)) for large n, with explicit continuity correction. Stirling-based proof, Berry-Esseen rate, and where the approximation breaks down (small p, small n, skewed binomials).
Why This Matters
De Moivre proved this result in 1733 for , and Laplace extended it to general in 1812. It is the central limit theorem in its first form, predating the modern statement by more than a century. For two hundred years it was the working tool that engineers, actuaries, and gamblers used to attach probabilities to deviations from expected counts.
The reason it still earns its own page on a site that already covers the general CLT: the binomial case is where students learn the continuity correction, the rule of thumb for "large enough ", and the visual intuition that ties Pascal's triangle to the Gaussian bell curve. The general CLT abstracts all of this away. De Moivre-Laplace keeps it concrete.
It is also the place where Berry-Esseen rate constants are easiest to state and verify: the third moment of a centered Bernoulli is explicit, so the worst-case rate becomes a closed form in .
Quick Version
| Object | Approximation |
|---|---|
| Standardized | |
| with continuity correction | |
| Rule of thumb | and |
| Best rate |
The rule of thumb sets the regime where the approximation is good. The Berry-Esseen bound makes that quantitative.
Statement
De Moivre-Laplace Theorem
Statement
Let with fixed. Then the standardized count converges in distribution to a standard normal: Equivalently, for every :
Intuition
A binomial count is the sum of independent Bernoulli() variables. Each contributes mean and variance . Standardizing the sum removes location and scale, and what is left has to converge to something universal. The Gaussian is the only stable law with the right symmetry and finite variance, so the limit is Gaussian.
Why It Matters
Before this result there was no machinery for attaching probabilities to "how far is 537 heads in 1000 fair flips from the expected 500". De Moivre gave the first quantitative answer, which is what every two-sample proportion test, every binomial confidence interval, and every poll margin-of-error ultimately uses. The general CLT comes later and is broader, but the binomial case is the one with the cleanest constants and the clearest visual story.
Failure Mode
The approximation degrades when the binomial is skewed: small or small . In those regimes the Poisson limit (Bin(n, /n) Pois(); see Poisson limit theorem) gives a better approximation than the Normal. The third moment of a centered Bernoulli is , which vanishes at and maxes out near . The Berry-Esseen constant grows as moves away from , which is the formal statement that "skewed binomials need more samples".
Continuity Correction
The binomial is discrete. The normal is continuous. The naive approximation under-estimates the binomial CDF systematically because it cuts the discrete mass at in half. The continuity-corrected version moves the cut to :
For a two-sided probability , the correction widens the interval by on each side:
The correction matters most at small or near the boundary. At , , the corrected approximation has roughly half the absolute error of the naive one. At the two are visually identical.
Continuity correction in practice
A fair coin is flipped times. Estimate .
Mean , variance , SD = 5.
Naive normal approximation: .
With continuity correction: .
Exact binomial CDF: .
The correction recovers two decimal places of accuracy that the naive form throws away.
Proof Sketch (Standard Form)
The classical De Moivre-Laplace proof uses Stirling's approximation to
the binomial coefficient and is the most direct path. The modern proof
uses characteristic functions and is
a one-page exercise in computing the limit of .
Both are folded into the Advanced block below.
Optional ProofStirling-based proof of De Moivre-LaplaceShow
Write . The binomial PMF is
Apply Stirling: . After substituting and expanding around :
where is the binary entropy. Combine with . The leading -term is . This is the negative KL divergence .
Expand to second order around : .
Substituting and simplifying:
Summing the local approximation over a window in corresponding to gives the integrated form .
Optional ProofCharacteristic-function proofShow
Let . Write for i.i.d. Bernoulli() variables, and let , so has mean and variance . The characteristic function of is
Expanding to second order: .
Then .
This is the characteristic function of . By Lévy's continuity theorem, converges in distribution to standard normal.
Quantitative BoundBerry-Esseen rate for the binomialShow
The third absolute central moment of a Bernoulli() variable is , which simplifies in the centered case to a function of vanishing at . After standardization, the Berry-Esseen constant gives
where and (Shevtsova 2011). The bound diverges as or , formalizing the rule of thumb that the binomial is hardest to approximate when one tail is rare.
For , : bound is . The true sup is much smaller, around , but the Berry-Esseen bound is universal and free of additional moment assumptions.
When to Use the Normal Approximation
| Regime | Better approximation |
|---|---|
| , | Normal with continuity correction |
| small, large (rare events) | Poisson with |
| moderate, very small , very large | Both Normal and Poisson are reasonable; Poisson is simpler |
| Both and small | Use the exact binomial PMF |
The Poisson alternative is treated on its own page: Poisson limit theorem.
Common Confusions
The approximation is not a tail bound
De Moivre-Laplace gives an approximation to for typical . It is not a tail bound. For very large deviations ( many standard deviations from ), the Gaussian tails decay faster than the binomial tails, so the approximation under-estimates probabilities in the deep tail. Use Chernoff or Hoeffding for rigorous tail bounds, not the Normal approximation.
Continuity correction is not optional decoration
Skipping the at , costs roughly two decimal places of accuracy and reverses the direction of the bias. The corrected form is the actual second-order CLT approximation; the uncorrected form is a first-order approximation that throws away information you already have. The correction is mechanical and adds zero cost.
Bin(n,p) is not approximately Normal when p is near 0 or 1
The rule of thumb , is not arbitrary. When is small relative to , the binomial is skewed and the Poisson approximation dominates. The Normal is symmetric; the binomial only becomes symmetric for . Forcing a Normal approximation on a skewed binomial systematically miscalibrates one-sided tail probabilities.
Exercises
Problem
A factory produces components with defect rate . In a batch of , what is the probability of at most defects? Compute using (a) Normal approximation without continuity correction, (b) Normal with continuity correction, (c) Poisson approximation with . Compare to the exact value .
Problem
Show that for Bernoulli() variables, the Berry-Esseen ratio equals . Hence determine the value of that minimizes this ratio.
References
Canonical:
- Feller, An Introduction to Probability Theory and Its Applications, Vol I (3rd ed., 1968), Chapter VII (De Moivre-Laplace via Stirling) and Chapter VIII (Berry-Esseen rate).
- Blitzstein and Hwang, Introduction to Probability (2nd ed., 2019), Chapter 10 (CLT and normal approximation, with continuity correction worked).
- Billingsley, Probability and Measure (3rd ed., 1995), Section 27 (modern proof via characteristic functions).
Current:
- Tijms, Understanding Probability (3rd ed., 2012), Chapter 5 (continuity correction with applied examples).
- Shevtsova, "On the absolute constants in the Berry-Esseen inequality for i.i.d. summands" (2011), arXiv:1111.6554, sharpens the universal constant to .
Next Topics
- Central Limit Theorem — the general result this is a special case of.
- Poisson Limit Theorem — the alternative approximation for small and large .
- Characteristic Functions — the standard tool for proving CLT-type results in the modern formulation.
Last reviewed: May 12, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
3- Common Probability Distributionslayer 0A · tier 1
- Central Limit Theoremlayer 0B · tier 1
- Moment Generating Functionslayer 0A · tier 2
Derived topics
0No published topic currently declares this as a prerequisite.