De Moivre-Laplace Theorem

Sneiderman, Robby

Statistical Estimation

De Moivre-Laplace Theorem

The first central limit theorem, historically. Bin(n,p) approximates N(np, np(1-p)) for large n, with explicit continuity correction. Stirling-based proof, Berry-Esseen rate, and where the approximation breaks down (small p, small n, skewed binomials).

ImportantCoreTier 2StableSupporting~30 min

For:StatsGeneral

Prerequisites

Common Probability Distributions Central Limit Theorem Moment Generating Functions

Prereq Map

Why This Matters

De Moivre proved this result in 1733 for $p = 1/2$ , and Laplace extended it to general $p$ in 1812. It is the central limit theorem in its first form, predating the modern statement by more than a century. For two hundred years it was the working tool that engineers, actuaries, and gamblers used to attach probabilities to deviations from expected counts.

The reason it still earns its own page on a site that already covers the general CLT: the binomial case is where students learn the continuity correction, the rule of thumb for "large enough $n$ ", and the visual intuition that ties Pascal's triangle to the Gaussian bell curve. The general CLT abstracts all of this away. De Moivre-Laplace keeps it concrete.

It is also the place where Berry-Esseen rate constants are easiest to state and verify: the third moment of a centered Bernoulli is explicit, so the worst-case rate $C \rho / (\sigma^3 \sqrt{n})$ becomes a closed form in $p$ .

Quick Version

Object	Approximation
$X \sim \mathrm{Bin}(n, p)$	$X \approx \mathcal{N}(np, \, np(1-p))$
Standardized	$\frac{X - np}{\sqrt{np(1-p)}} \xrightarrow{d} \mathcal{N}(0, 1)$
$\Pr[X \leq k]$	$\Phi\!\left(\frac{k + 0.5 - np}{\sqrt{np(1-p)}}\right)$ with continuity correction
Rule of thumb	$np \geq 10$ and $n(1-p) \geq 10$
Best rate	$\sup_x \lvert F_n(x) - \Phi(x)\rvert \leq C \frac{1 - 2p(1-p)}{\sqrt{n \, p(1-p)}}$

The rule of thumb sets the regime where the approximation is good. The Berry-Esseen bound makes that quantitative.

Statement

Theorem

De Moivre-Laplace Theorem

Statement

Let $X_n \sim \mathrm{Bin}(n, p)$ with $p \in (0, 1)$ fixed. Then the standardized count converges in distribution to a standard normal: $\frac{X_n - np}{\sqrt{np(1-p)}} \xrightarrow{d} \mathcal{N}(0, 1) \quad \text{as } n \to \infty.$ Equivalently, for every $a < b$ : $\Pr\!\left[a \leq \frac{X_n - np}{\sqrt{np(1-p)}} \leq b\right] \to \Phi(b) - \Phi(a).$

Intuition

A binomial count is the sum of $n$ independent Bernoulli( $p$ ) variables. Each contributes mean $p$ and variance $p(1-p)$ . Standardizing the sum removes location and scale, and what is left has to converge to something universal. The Gaussian is the only stable law with the right symmetry and finite variance, so the limit is Gaussian.

Why It Matters

Before this result there was no machinery for attaching probabilities to "how far is 537 heads in 1000 fair flips from the expected 500". De Moivre gave the first quantitative answer, which is what every two-sample proportion test, every binomial confidence interval, and every poll margin-of-error ultimately uses. The general CLT comes later and is broader, but the binomial case is the one with the cleanest constants and the clearest visual story.

Failure Mode

The approximation degrades when the binomial is skewed: small $np$ or small $n(1-p)$ . In those regimes the Poisson limit (Bin(n, $\lambda$ /n) $\to$ Pois( $\lambda$ ); see Poisson limit theorem) gives a better approximation than the Normal. The third moment of a centered Bernoulli is $p(1-p)(1 - 2p)$ , which vanishes at $p = 1/2$ and maxes out near $p \in \{0, 1\}$ . The Berry-Esseen constant grows as $p$ moves away from $1/2$ , which is the formal statement that "skewed binomials need more samples".

report a correction →

Continuity Correction

The binomial is discrete. The normal is continuous. The naive approximation $\Pr[X \leq k] \approx \Phi\!\left(\frac{k - np}{\sqrt{np(1-p)}}\right)$ under-estimates the binomial CDF systematically because it cuts the discrete mass at $k$ in half. The continuity-corrected version moves the cut to $k + 0.5$ : $\Pr[X \leq k] \approx \Phi\!\left(\frac{k + 0.5 - np}{\sqrt{np(1-p)}}\right).$

For a two-sided probability $\Pr[a \leq X \leq b]$ , the correction widens the interval by $0.5$ on each side: $\Pr[a \leq X \leq b] \approx \Phi\!\left(\frac{b + 0.5 - np}{\sqrt{np(1-p)}}\right) - \Phi\!\left(\frac{a - 0.5 - np}{\sqrt{np(1-p)}}\right).$

The correction matters most at small $n$ or near the boundary. At $n = 30$ , $p = 0.5$ , the corrected approximation has roughly half the absolute error of the naive one. At $n = 10^4$ the two are visually identical.

Example

Continuity correction in practice

A fair coin is flipped $n = 100$ times. Estimate $\Pr[X \leq 55]$ .

Mean $np = 50$ , variance $np(1-p) = 25$ , SD = 5.

Naive normal approximation: $\Pr[X \leq 55] \approx \Phi((55 - 50)/5) = \Phi(1) \approx 0.8413$ .

With continuity correction: $\Pr[X \leq 55] \approx \Phi((55.5 - 50)/5) = \Phi(1.1) \approx 0.8643$ .

Exact binomial CDF: $0.8644$ .

The correction recovers two decimal places of accuracy that the naive form throws away.

Proof Sketch (Standard Form)

The classical De Moivre-Laplace proof uses Stirling's approximation to the binomial coefficient and is the most direct path. The modern proof uses characteristic functions and is a one-page exercise in computing the limit of $(1 + i p t / \sqrt{np(1-p)} + \ldots)^n$ . Both are folded into the Advanced block below.

Optional ProofStirling-based proof of De Moivre-LaplaceShow

Write $k = np + x\sqrt{np(1-p)}$ . The binomial PMF is $\Pr[X_n = k] = \binom{n}{k} p^k (1-p)^{n-k}.$

Apply Stirling: $\log m! = m \log m - m + \frac{1}{2}\log(2\pi m) + O(1/m)$ . After substituting and expanding around $k = np$ :

$\log \binom{n}{k} = n H(k/n) - \frac{1}{2}\log(2\pi n \cdot k/n \cdot (1 - k/n)) + O(1/n)$

where $H(q) = -q \log q - (1-q)\log(1-q)$ is the binary entropy. Combine with $\log[p^k (1-p)^{n-k}] = k \log p + (n - k)\log(1-p)$ . The leading $n$ -term is $n[H(k/n) + (k/n) \log p + (1 - k/n)\log(1-p)]$ . This is the negative KL divergence $-n \, D(k/n \,\|\, p)$ .

Expand $D(q \,\|\, p)$ to second order around $q = p$ : $D(q\,\|\,p) = \frac{(q-p)^2}{2 p(1-p)} + O((q-p)^3)$ .

Substituting $q = k/n = p + x/\sqrt{n}\cdot \sqrt{p(1-p)/n}$ and simplifying:

$\Pr[X_n = k] \approx \frac{1}{\sqrt{2\pi n p(1-p)}} \exp\!\left(-\frac{x^2}{2}\right).$

Summing the local approximation over a window in $k$ corresponding to $x \in [a, b]$ gives the integrated form $\Pr[a \leq (X_n - np)/\sqrt{np(1-p)} \leq b] \to \Phi(b) - \Phi(a)$ .

Optional ProofCharacteristic-function proofShow

Let $Z_n = (X_n - np)/\sqrt{np(1-p)}$ . Write $X_n = \sum_{i=1}^n Y_i$ for i.i.d. Bernoulli( $p$ ) variables, and let $\tilde Y_i = (Y_i - p)/\sqrt{p(1-p)}$ , so $\tilde Y_i$ has mean $0$ and variance $1$ . The characteristic function of $\tilde Y_i$ is

$\varphi_{\tilde Y}(t) = p \cdot e^{i t (1-p)/\sqrt{p(1-p)}} + (1-p) \cdot e^{-i t p/\sqrt{p(1-p)}}.$

Expanding to second order: $\varphi_{\tilde Y}(t) = 1 - t^2/2 + O(t^3)$ .

Then $\varphi_{Z_n}(t) = \varphi_{\tilde Y}(t/\sqrt{n})^n = \left(1 - \frac{t^2}{2n} + O(n^{-3/2})\right)^n \to e^{-t^2/2}$ .

This is the characteristic function of $\mathcal{N}(0, 1)$ . By Lévy's continuity theorem, $Z_n$ converges in distribution to standard normal.

Quantitative BoundBerry-Esseen rate for the binomialShow

The third absolute central moment of a Bernoulli( $p$ ) variable is $\rho = p(1-p)\lvert 1 - 2p\rvert + p(1-p)(2p^2 - 2p + 1) \cdot \mathbf{1}\{\text{terms}\}$ , which simplifies in the centered case to a function of $p$ vanishing at $p = 1/2$ . After standardization, the Berry-Esseen constant gives

$\sup_x \lvert F_n(x) - \Phi(x)\rvert \;\leq\; \frac{C \, \rho}{\sigma^3 \sqrt{n}} \;=\; \frac{C (1 - 2p(1-p))}{\sqrt{n \, p(1-p)}}$

where $\sigma^2 = p(1-p)$ and $C \leq 0.4748$ (Shevtsova 2011). The bound diverges as $p \to 0$ or $p \to 1$ , formalizing the rule of thumb that the binomial is hardest to approximate when one tail is rare.

For $p = 1/2$ , $n = 100$ : bound is $\leq 0.4748 / (\sqrt{100 \cdot 0.25}) = 0.095$ . The true sup is much smaller, around $0.004$ , but the Berry-Esseen bound is universal and free of additional moment assumptions.

When to Use the Normal Approximation

Regime	Better approximation
$np \geq 10$ , $n(1-p) \geq 10$	Normal with continuity correction
$np$ small, $n(1-p)$ large (rare events)	Poisson with $\lambda = np$
$np$ moderate, very small $p$ , very large $n$	Both Normal and Poisson are reasonable; Poisson is simpler
Both $np$ and $n(1-p)$ small	Use the exact binomial PMF

The Poisson alternative is treated on its own page: Poisson limit theorem.

Common Confusions

Watch Out

The approximation is not a tail bound

De Moivre-Laplace gives an approximation to $\Pr[X \leq k]$ for typical $k$ . It is not a tail bound. For very large deviations ( $k$ many standard deviations from $np$ ), the Gaussian tails decay faster than the binomial tails, so the approximation under-estimates probabilities in the deep tail. Use Chernoff or Hoeffding for rigorous tail bounds, not the Normal approximation.

Watch Out

Continuity correction is not optional decoration

Skipping the $+ 0.5$ at $n = 100$ , $p = 0.5$ costs roughly two decimal places of accuracy and reverses the direction of the bias. The corrected form is the actual second-order CLT approximation; the uncorrected form is a first-order approximation that throws away information you already have. The correction is mechanical and adds zero cost.

Watch Out

Bin(n,p) is not approximately Normal when p is near 0 or 1

The rule of thumb $np \geq 10$ , $n(1-p) \geq 10$ is not arbitrary. When $p$ is small relative to $1/n$ , the binomial is skewed and the Poisson approximation dominates. The Normal is symmetric; the binomial only becomes symmetric for $p \approx 1/2$ . Forcing a Normal approximation on a skewed binomial systematically miscalibrates one-sided tail probabilities.

Exercises

ExerciseCore

Problem

A factory produces components with defect rate $p = 0.02$ . In a batch of $n = 1000$ , what is the probability of at most $25$ defects? Compute using (a) Normal approximation without continuity correction, (b) Normal with continuity correction, (c) Poisson approximation with $\lambda = 20$ . Compare to the exact value $\Pr[\mathrm{Bin}(1000, 0.02) \leq 25] \approx 0.8881$ .

ExerciseAdvanced

Problem

Show that for Bernoulli( $p$ ) variables, the Berry-Esseen ratio $\rho/\sigma^3$ equals $(1 - 2p(1-p))/\sqrt{p(1-p)}$ . Hence determine the value of $p$ that minimizes this ratio.

References

Canonical:

Feller, An Introduction to Probability Theory and Its Applications, Vol I (3rd ed., 1968), Chapter VII (De Moivre-Laplace via Stirling) and Chapter VIII (Berry-Esseen rate).
Blitzstein and Hwang, Introduction to Probability (2nd ed., 2019), Chapter 10 (CLT and normal approximation, with continuity correction worked).
Billingsley, Probability and Measure (3rd ed., 1995), Section 27 (modern proof via characteristic functions).

Current:

Tijms, Understanding Probability (3rd ed., 2012), Chapter 5 (continuity correction with applied examples).
Shevtsova, "On the absolute constants in the Berry-Esseen inequality for i.i.d. summands" (2011), arXiv:1111.6554, sharpens the universal constant to $C \leq 0.4748$ .

Next Topics

Central Limit Theorem — the general result this is a special case of.
Poisson Limit Theorem — the alternative approximation for small $p$ and large $n$ .
Characteristic Functions — the standard tool for proving CLT-type results in the modern formulation.

Last reviewed: May 12, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

3

Common Probability Distributionslayer 0A · tier 1
Central Limit Theoremlayer 0B · tier 1
Moment Generating Functionslayer 0A · tier 2

Derived topics

0

No published topic currently declares this as a prerequisite.