Skip to main content

Foundations

Poisson Distribution

The Poisson distribution as the rare-event limit of the Binomial and as the count law of a Poisson process: PMF, MGF, mean equals variance, additivity, thinning, superposition, MLE, and the connection to the Exponential and Gamma.

ImportantCoreTier 1StableCore spine~50 min
For:StatsActuarialGeneral

Why This Matters

The Poisson distribution is the law of "rare independent counts": the number of arrivals in a fixed window when each potential arrival is improbable, the trials are independent, and the rate of arrivals is roughly constant. Three independent threads converge on it:

  1. Limit of the Binomial. If nn\to\infty and p0p\to 0 with npλnp\to\lambda, the Binomial(n,p)(n,p) distribution converges to Poisson(λ)(\lambda). This is the rare-event derivation: count successes in a large number of nearly impossible trials.
  2. Count law of a Poisson process. For a Poisson process with rate λ\lambda on [0,T][0,T], the number of events in any interval of length tt is Poisson(λt)(\lambda t), independent across disjoint intervals. The Exponential distribution gives the inter-arrival times; the Poisson gives the counts.
  3. Maximum entropy on the nonnegative integers with fixed mean. Among all distributions on {0,1,2,}\{0,1,2,\dots\} with mean λ\lambda, the Poisson is the one of maximum entropy. This is the information-theoretic anchor and the reason the Poisson appears as a default count model.

The mean equals the variance: E[X]=Var(X)=λ\mathbb{E}[X] = \operatorname{Var}(X) = \lambda. Real-world count data often has variance larger than the mean (overdispersion); when it does, the right model is a Negative Binomial or a Poisson-Gamma mixture, not a Poisson.

Definition

Definition

Poisson Distribution

A random variable XX has a Poisson distribution with rate λ>0\lambda > 0 if its PMF is

P(X=k)=λkeλk!,k=0,1,2,.\mathbb{P}(X = k) = \frac{\lambda^k e^{-\lambda}}{k!},\qquad k = 0, 1, 2, \dots.

The support is the set of nonnegative integers. The mean and variance are both λ\lambda.

The parameter λ\lambda is interpreted as the expected number of events. The probability mass is unimodal at λ\lfloor\lambda\rfloor when λ\lambda is not an integer, and bimodal at λ1\lambda - 1 and λ\lambda when λ\lambda is an integer.

Binomial-to-Poisson Limit

Theorem

Rare-Event Limit (Poisson Limit Theorem)

Statement

Let XnBin(n,pn)X_n\sim\operatorname{Bin}(n,p_n) with npnλ>0np_n\to\lambda > 0 as nn\to\infty. Then for every fixed k{0,1,2,}k\in\{0,1,2,\dots\}, P(Xn=k)λkeλk!.\mathbb{P}(X_n = k)\to\frac{\lambda^k e^{-\lambda}}{k!}.

Intuition

The Binomial PMF (nk)pnk(1pn)nk\binom{n}{k}p_n^k(1-p_n)^{n-k} has three factors. The binomial coefficient grows like nk/k!n^k/k!. The factor pnkp_n^k is (npn/n)k(λ/n)k(np_n/n)^k\to(\lambda/n)^k times nkn^k, so the two cancel to give λk/k!\lambda^k/k!. The factor (1pn)nkeλ(1-p_n)^{n-k}\to e^{-\lambda} by the calculus identity (1x/n)nex(1-x/n)^n\to e^{-x}.

Proof Sketch

Write pn=λn/np_n = \lambda_n/n where λn=npnλ\lambda_n = np_n\to\lambda. Then P(Xn=k)=n!k!(nk)!(λnn)k(1λnn)nk.\mathbb{P}(X_n = k) = \frac{n!}{k!(n-k)!}\left(\frac{\lambda_n}{n}\right)^k\left(1-\frac{\lambda_n}{n}\right)^{n-k}. The ratio n!/((nk)!nk)=(1)(11/n)(1(k1)/n)1n!/((n-k)!n^k) = (1)(1-1/n)\cdots(1-(k-1)/n)\to 1 as nn\to\infty. The factor (1λn/n)nkeλ(1-\lambda_n/n)^{n-k}\to e^{-\lambda} uses (1λn/n)neλ(1-\lambda_n/n)^n\to e^{-\lambda} and (1λn/n)k1(1-\lambda_n/n)^{-k}\to 1. Multiplying gives λk/k!eλ\lambda^k/k!\cdot e^{-\lambda}.

Why It Matters

This is the classical justification for using a Poisson model when you have a large number of nearly impossible independent trials: defects on a manufactured chip, mutations along a long DNA sequence, hits on a server in a one-second window. The convergence is pointwise in kk, but it can be strengthened to total-variation convergence; the rate is O(λ2/n)O(\lambda^2/n) in TV distance, which is the basis of Le Cam's Poisson-approximation theorem.

Failure Mode

The limit requires independence and constant per-trial probability pnp_n. Real-world counts of rare events often violate one or both: hospital admissions cluster across patients with the same flu; defects cluster within a single manufacturing batch. When events are positively correlated, the Poisson under-disperses the data (the empirical variance exceeds the mean), and a Negative Binomial or compound Poisson is the right model.

MGF and Mean Equals Variance

Theorem

Poisson MGF

Statement

For XPois(λ)X\sim\operatorname{Pois}(\lambda) and every sRs\in\mathbb{R}, MX(s)=E[esX]=exp ⁣(λ(es1)).M_X(s) = \mathbb{E}[e^{sX}] = \exp\!\left(\lambda(e^s - 1)\right).

Intuition

The exponential generating function of the PMF is the same as the MGF after the substitution z=esz = e^s. Identifying kλkzk/k!=eλz\sum_k \lambda^k z^k/k! = e^{\lambda z} gives the result up to the eλe^{-\lambda} normalization.

Proof Sketch

MX(s)=k=0eskλkeλk!=eλk=0(λes)kk!=eλeλes=eλ(es1).M_X(s) = \sum_{k=0}^\infty e^{sk}\frac{\lambda^k e^{-\lambda}}{k!} = e^{-\lambda}\sum_{k=0}^\infty \frac{(\lambda e^s)^k}{k!} = e^{-\lambda}e^{\lambda e^s} = e^{\lambda(e^s-1)}.

Why It Matters

Differentiating once at s=0s = 0 gives E[X]=λ\mathbb{E}[X] = \lambda; differentiating twice gives E[X2]=λ+λ2\mathbb{E}[X^2] = \lambda + \lambda^2, so Var(X)=λ\operatorname{Var}(X) = \lambda. The mean equals the variance, and this is a sharp diagnostic: if a count data set has empirical variance significantly larger than its mean, the Poisson model is misspecified.

Failure Mode

The Poisson MGF is finite for every ss, but only narrowly so: the log-MGF λ(es1)\lambda(e^s-1) grows doubly exponentially in ss, which makes the Poisson sub-exponential rather than sub-Gaussian. Tail bounds for the Poisson are tighter than the generic sub-exponential bound; see Bennett's and Bernstein's inequalities.

Additivity, Thinning, and Superposition

Theorem

Additivity, Thinning, and Superposition

Statement

  1. Additivity. If XiPois(λi)X_i\sim\operatorname{Pois}(\lambda_i) are independent, then XiPois(λi)\sum X_i \sim \operatorname{Pois}(\sum\lambda_i).
  2. Conditional binomiality. Conditional on X1++Xk=NX_1+\cdots+X_k = N, the joint distribution of (X1,,Xk)(X_1,\dots,X_k) is multinomial with parameters NN and (λ1/Λ,,λk/Λ)(\lambda_1/\Lambda,\dots,\lambda_k/\Lambda) where Λ=λi\Lambda = \sum\lambda_i.
  3. Thinning. If XPois(λ)X\sim\operatorname{Pois}(\lambda) and each event is independently classified as type AA with probability pp and type BB with probability 1p1-p, then the type-AA count is Pois(λp)\operatorname{Pois}(\lambda p), the type-BB count is Pois(λ(1p))\operatorname{Pois}(\lambda(1-p)), and the two are independent.

Intuition

Independent Poisson processes merge ("superposition") into a Poisson process whose rate is the sum. Splitting events of one Poisson process into types based on independent coin flips ("thinning") gives independent Poisson processes whose rates partition the original. The conditional-binomial statement is the discrete-time consequence of the same construction.

Proof Sketch

Additivity is the MGF argument: MXi(s)=exp(λi(es1))=exp((λi)(es1))M_{\sum X_i}(s) = \prod \exp(\lambda_i(e^s-1)) = \exp((\sum\lambda_i)(e^s-1)), the MGF of Pois(λi)\operatorname{Pois}(\sum\lambda_i). Thinning follows from the same MGF argument applied to the marked process. The conditional-binomial statement is Bayes' rule on PMFs: P(X1=k1,,Xk=kkXi=N)=λikieλi/ki!ΛNeΛ/N!=(Nk1,,kk)(λiΛ)ki,\mathbb{P}(X_1=k_1,\dots,X_k=k_k\mid\textstyle\sum X_i = N) = \frac{\prod \lambda_i^{k_i}e^{-\lambda_i}/k_i!}{\Lambda^N e^{-\Lambda}/N!} = \binom{N}{k_1,\dots,k_k}\prod\left(\frac{\lambda_i}{\Lambda}\right)^{k_i}, which is the multinomial PMF.

Why It Matters

Superposition justifies pooling counts from independent sources with potentially different rates. Thinning justifies splitting a single count stream into independent sub-streams. The conditional-binomial result is what makes Pearson's Chi-squared test for cell counts valid: under the null hypothesis of independence, observed cell counts are conditionally multinomial with cell probabilities equal to row times column marginals. See chi-squared distribution and tests.

Failure Mode

All three results require independence. Two count streams that interact (the second is triggered by the first) are not the superposition of independent Poissons; their merged process is not Poisson. Thinning with state-dependent rates produces a non-Poisson type-AA count.

Maximum Likelihood Estimation

Theorem

MLE for the Poisson Rate

Statement

For an i.i.d. sample X1,,XnX_1,\dots,X_n from Pois(λ)\operatorname{Pois}(\lambda), the MLE is λ^=Xˉn=1ni=1nXi.\hat\lambda = \bar X_n = \frac{1}{n}\sum_{i=1}^n X_i. The MLE is unbiased and achieves the Cramer-Rao lower bound exactly: Var(λ^)=λ/n\operatorname{Var}(\hat\lambda) = \lambda/n.

Intuition

The Poisson is a one-parameter exponential family with sufficient statistic Xi\sum X_i. The MLE of the natural parameter is the empirical mean of the sufficient statistic.

Proof Sketch

The log-likelihood is (λ)=i=1n(XilogλλlogXi!)=logλXinλconst.\ell(\lambda) = \sum_{i=1}^n (X_i \log\lambda - \lambda - \log X_i!) = \log\lambda\sum X_i - n\lambda - \text{const}. Differentiating: (λ)=(Xi)/λn=0\ell'(\lambda) = (\sum X_i)/\lambda - n = 0, so λ^=Xˉn\hat\lambda = \bar X_n. The Fisher information per observation is I(λ)=1/λI(\lambda) = 1/\lambda, so the asymptotic variance is λ/n\lambda/n. Direct computation: Var(Xˉn)=Var(Xi)/n=λ/n\operatorname{Var}(\bar X_n) = \operatorname{Var}(X_i)/n = \lambda/n, matching the bound at every nn, not just asymptotically.

Why It Matters

The Poisson MLE is one of the few MLEs that achieves the Cramer-Rao bound exactly in finite samples. The asymptotic theory of MLEs is unnecessary here; the result holds at n=1n = 1. The estimator is unbiased, consistent, and efficient, which together is most of what point-estimation theory asks for. See maximum likelihood estimation for the general framework.

Failure Mode

The MLE assumes Poisson data. With overdispersed counts (variance exceeding mean), Xˉn\bar X_n is still consistent for the mean but the model is misspecified; standard errors based on λ^/n\hat\lambda/n underestimate the true sampling variance. The fix is a Negative Binomial regression or a Quasi-Poisson approach. See maximum likelihood estimation for the QMLE / sandwich-variance treatment of misspecification.

The Bayesian counterpart is the Gamma-Poisson conjugacy, which gives a closed-form posterior in gamma distribution.

Bridge to Exponential and Gamma

A rate-λ\lambda Poisson process on [0,)[0,\infty) has three equivalent characterizations:

  1. The number of events in any interval of length tt is Pois(λt)\operatorname{Pois}(\lambda t), with independence across disjoint intervals.
  2. The inter-arrival times are i.i.d. Exp(λ)\operatorname{Exp}(\lambda).
  3. The waiting time for the kk-th event is Gamma(k,λ)\operatorname{Gamma}(k,\lambda).

Given (1), the second follows by computing the survival of the first inter-arrival: P(T1>t)=P(N(t)=0)=eλt\mathbb{P}(T_1 > t) = \mathbb{P}(N(t) = 0) = e^{-\lambda t}. Given (2), the third follows by Gamma additivity. The three characterizations are equivalent for "ordinary" point processes on the real line and are the standard way the Poisson process is introduced.

A consequence: the Poisson CDF at kk has a Gamma representation. For NPois(λ)N\sim\operatorname{Pois}(\lambda), P(Nk)=P(Tk+1>1)=P(Gamma(k+1,λ)>1).\mathbb{P}(N \le k) = \mathbb{P}(T_{k+1} > 1) = \mathbb{P}(\operatorname{Gamma}(k+1,\lambda) > 1). This is what numerical libraries use to compute Poisson tail probabilities: the regularized incomplete Gamma function evaluates the Poisson CDF.

Overdispersion: When the Poisson Fails

DiagnosticPoisson behaviorReal-data deviationBetter model
Sample variance versus sample meanEqual in expectationVariance much larger than meanNegative Binomial
Empirical zero-rateeλ^e^{-\hat\lambda}More zeros than eλ^e^{-\hat\lambda}Zero-Inflated Poisson
Per-group ratesConstant across groupsRates vary by groupMixed-effects Poisson
Clustered countsIndependent across unitsCounts clusterCompound Poisson

The diagnostic for overdispersion is the ratio of sample variance to sample mean. Under a true Poisson, the ratio is approximately one for large nn. Values significantly above one signal heterogeneity (the rate varies across observations) or clustering (counts come in bursts). The classical fix is to model the rate as Gamma-distributed across observations, giving the Negative Binomial.

Common Confusions

Watch Out

Poisson processes and Poisson distributions are not the same object

The Poisson distribution is a probability law on the integers. The Poisson process is a stochastic process on the real line (or higher-dimensional spaces) whose counts in any region are Poisson-distributed and independent across disjoint regions. Every Poisson process has Poisson-distributed counts, but the converse is not true: a count process whose counts are Poisson-distributed within each interval is not automatically a Poisson process if independence across intervals fails.

Watch Out

Mean equals variance is a property, not a fact about all count data

Real-world count data are rarely Poisson in the strict sense. Overdispersion is the norm. The Poisson model is a starting point and a useful approximation for low-rate independent events; it is not a universal count model. Always check the empirical variance-to-mean ratio before trusting Poisson standard errors.

Watch Out

The rate parameter is not the same in different parameterizations

A Poisson process with rate λ\lambda events per second has counts in a one-minute window distributed as Pois(60λ)\operatorname{Pois}(60\lambda), not Pois(λ)\operatorname{Pois}(\lambda). The unit of time is folded into the rate. Software libraries typically take a single λ\lambda that is the expected count in the window of interest, so the unit of time is implicit. Always confirm which λ\lambda the function expects.

Exercises

ExerciseCore

Problem

A website receives an average of 12 visitors per minute. Assuming Poisson arrivals, find the probability of receiving exactly 10 visitors in a randomly chosen minute and the probability of receiving more than 20 visitors.

ExerciseCore

Problem

Two independent type-A and type-B emails arrive at a server with rates λA=4\lambda_A = 4 per hour and λB=6\lambda_B = 6 per hour. Find the distribution of the total count in a one-hour window and the probability that, given the total is 15, exactly 6 are type A.

ExerciseAdvanced

Problem

Show that the sample variance σ^n2=(1/n)(XiXˉn)2\hat\sigma^2_n = (1/n)\sum(X_i - \bar X_n)^2 from an i.i.d. Poisson sample is a consistent but inefficient estimator of λ\lambda, and identify a more efficient estimator that combines the sample mean and sample variance.

ExerciseAdvanced

Problem

Construct a 95% Wald confidence interval for λ\lambda based on Xˉn\bar X_n. Then construct a 95% exact interval using the Gamma-Poisson relationship. Compare them at n=10n = 10 and Xˉn=0.5\bar X_n = 0.5.

References

Canonical:

  • Casella and Berger, Statistical Inference (2002), Chapter 3 (Section 3.2 on the Poisson family), Chapter 7 (Poisson MLE), Chapter 10 (asymptotics).
  • Lehmann and Casella, Theory of Point Estimation (1998), Chapter 1 (exponential-family treatment of the Poisson).
  • Bickel and Doksum, Mathematical Statistics, Volume I (2015), Chapter 1 (Section 1.5).

Stochastic processes:

  • Ross, Introduction to Probability Models (2019), Chapter 5 (Poisson processes, thinning, superposition).
  • Kingman, Poisson Processes (1993), Chapters 1 and 2.
  • Grimmett and Stirzaker, Probability and Random Processes (2020), Chapter 6.

Overdispersion and count models:

  • McCullagh and Nelder, Generalized Linear Models (1989), Chapter 6 (Poisson regression and quasi-likelihood).
  • Cameron and Trivedi, Regression Analysis of Count Data (2013), Chapters 3 and 4.

Last reviewed: May 11, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

3

Derived topics

0

No published topic currently declares this as a prerequisite.