Poisson Distribution

Sneiderman, Robby

Foundations

Poisson Distribution

The Poisson distribution as the rare-event limit of the Binomial and as the count law of a Poisson process: PMF, MGF, mean equals variance, additivity, thinning, superposition, MLE, and the connection to the Exponential and Gamma.

ImportantCoreTier 1StableCore spine~50 min

For:StatsActuarialGeneral

Prerequisites

Common Probability Distributions Distributions Atlas Exponential Distribution

Prereq Map

Why This Matters

The Poisson distribution is the law of "rare independent counts": the number of arrivals in a fixed window when each potential arrival is improbable, the trials are independent, and the rate of arrivals is roughly constant. Three independent threads converge on it:

Limit of the Binomial. If $n\to\infty$ and $p\to 0$ with $np\to\lambda$ , the Binomial $(n,p)$ distribution converges to Poisson $(\lambda)$ . This is the rare-event derivation: count successes in a large number of nearly impossible trials.
Count law of a Poisson process. For a Poisson process with rate $\lambda$ on $[0,T]$ , the number of events in any interval of length $t$ is Poisson $(\lambda t)$ , independent across disjoint intervals. The Exponential distribution gives the inter-arrival times; the Poisson gives the counts.
Maximum entropy on the nonnegative integers with fixed mean. Among all distributions on $\{0,1,2,\dots\}$ with mean $\lambda$ , the Poisson is the one of maximum entropy. This is the information-theoretic anchor and the reason the Poisson appears as a default count model.

The mean equals the variance: $\mathbb{E}[X] = \operatorname{Var}(X) = \lambda$ . Real-world count data often has variance larger than the mean (overdispersion); when it does, the right model is a Negative Binomial or a Poisson-Gamma mixture, not a Poisson.

Definition

Poisson Distribution $X \sim Pois (λ)$

A random variable $X$ has a Poisson distribution with rate $\lambda > 0$ if its PMF is

$\mathbb{P}(X = k) = \frac{\lambda^k e^{-\lambda}}{k!},\qquad k = 0, 1, 2, \dots.$

The support is the set of nonnegative integers. The mean and variance are both $\lambda$ .

The parameter $\lambda$ is interpreted as the expected number of events. The probability mass is unimodal at $\lfloor\lambda\rfloor$ when $\lambda$ is not an integer, and bimodal at $\lambda - 1$ and $\lambda$ when $\lambda$ is an integer.

Binomial-to-Poisson Limit

Theorem

Rare-Event Limit (Poisson Limit Theorem)

Statement

Let $X_n\sim\operatorname{Bin}(n,p_n)$ with $np_n\to\lambda > 0$ as $n\to\infty$ . Then for every fixed $k\in\{0,1,2,\dots\}$ , $\mathbb{P}(X_n = k)\to\frac{\lambda^k e^{-\lambda}}{k!}.$

Intuition

The Binomial PMF $\binom{n}{k}p_n^k(1-p_n)^{n-k}$ has three factors. The binomial coefficient grows like $n^k/k!$ . The factor $p_n^k$ is $(np_n/n)^k\to(\lambda/n)^k$ times $n^k$ , so the two cancel to give $\lambda^k/k!$ . The factor $(1-p_n)^{n-k}\to e^{-\lambda}$ by the calculus identity $(1-x/n)^n\to e^{-x}$ .

Proof Sketch

Write $p_n = \lambda_n/n$ where $\lambda_n = np_n\to\lambda$ . Then $\mathbb{P}(X_n = k) = \frac{n!}{k!(n-k)!}\left(\frac{\lambda_n}{n}\right)^k\left(1-\frac{\lambda_n}{n}\right)^{n-k}.$ The ratio $n!/((n-k)!n^k) = (1)(1-1/n)\cdots(1-(k-1)/n)\to 1$ as $n\to\infty$ . The factor $(1-\lambda_n/n)^{n-k}\to e^{-\lambda}$ uses $(1-\lambda_n/n)^n\to e^{-\lambda}$ and $(1-\lambda_n/n)^{-k}\to 1$ . Multiplying gives $\lambda^k/k!\cdot e^{-\lambda}$ .

Why It Matters

This is the classical justification for using a Poisson model when you have a large number of nearly impossible independent trials: defects on a manufactured chip, mutations along a long DNA sequence, hits on a server in a one-second window. The convergence is pointwise in $k$ , but it can be strengthened to total-variation convergence; the rate is $O(\lambda^2/n)$ in TV distance, which is the basis of Le Cam's Poisson-approximation theorem.

Failure Mode

The limit requires independence and constant per-trial probability $p_n$ . Real-world counts of rare events often violate one or both: hospital admissions cluster across patients with the same flu; defects cluster within a single manufacturing batch. When events are positively correlated, the Poisson under-disperses the data (the empirical variance exceeds the mean), and a Negative Binomial or compound Poisson is the right model.

report a correction →

MGF and Mean Equals Variance

Theorem

Poisson MGF

Statement

For $X\sim\operatorname{Pois}(\lambda)$ and every $s\in\mathbb{R}$ , $M_X(s) = \mathbb{E}[e^{sX}] = \exp\!\left(\lambda(e^s - 1)\right).$

Intuition

The exponential generating function of the PMF is the same as the MGF after the substitution $z = e^s$ . Identifying $\sum_k \lambda^k z^k/k! = e^{\lambda z}$ gives the result up to the $e^{-\lambda}$ normalization.

Proof Sketch

$M_X(s) = \sum_{k=0}^\infty e^{sk}\frac{\lambda^k e^{-\lambda}}{k!} = e^{-\lambda}\sum_{k=0}^\infty \frac{(\lambda e^s)^k}{k!} = e^{-\lambda}e^{\lambda e^s} = e^{\lambda(e^s-1)}.$

Why It Matters

Differentiating once at $s = 0$ gives $\mathbb{E}[X] = \lambda$ ; differentiating twice gives $\mathbb{E}[X^2] = \lambda + \lambda^2$ , so $\operatorname{Var}(X) = \lambda$ . The mean equals the variance, and this is a sharp diagnostic: if a count data set has empirical variance significantly larger than its mean, the Poisson model is misspecified.

Failure Mode

The Poisson MGF is finite for every $s$ , but only narrowly so: the log-MGF $\lambda(e^s-1)$ grows doubly exponentially in $s$ , which makes the Poisson sub-exponential rather than sub-Gaussian. Tail bounds for the Poisson are tighter than the generic sub-exponential bound; see Bennett's and Bernstein's inequalities.

report a correction →

Additivity, Thinning, and Superposition

Theorem

Additivity, Thinning, and Superposition

Statement

Additivity. If $X_i\sim\operatorname{Pois}(\lambda_i)$ are independent, then $\sum X_i \sim \operatorname{Pois}(\sum\lambda_i)$ .
Conditional binomiality. Conditional on $X_1+\cdots+X_k = N$ , the joint distribution of $(X_1,\dots,X_k)$ is multinomial with parameters $N$ and $(\lambda_1/\Lambda,\dots,\lambda_k/\Lambda)$ where $\Lambda = \sum\lambda_i$ .
Thinning. If $X\sim\operatorname{Pois}(\lambda)$ and each event is independently classified as type $A$ with probability $p$ and type $B$ with probability $1-p$ , then the type- $A$ count is $\operatorname{Pois}(\lambda p)$ , the type- $B$ count is $\operatorname{Pois}(\lambda(1-p))$ , and the two are independent.

Intuition

Independent Poisson processes merge ("superposition") into a Poisson process whose rate is the sum. Splitting events of one Poisson process into types based on independent coin flips ("thinning") gives independent Poisson processes whose rates partition the original. The conditional-binomial statement is the discrete-time consequence of the same construction.

Proof Sketch

Additivity is the MGF argument: $M_{\sum X_i}(s) = \prod \exp(\lambda_i(e^s-1)) = \exp((\sum\lambda_i)(e^s-1))$ , the MGF of $\operatorname{Pois}(\sum\lambda_i)$ . Thinning follows from the same MGF argument applied to the marked process. The conditional-binomial statement is Bayes' rule on PMFs: $\mathbb{P}(X_1=k_1,\dots,X_k=k_k\mid\textstyle\sum X_i = N) = \frac{\prod \lambda_i^{k_i}e^{-\lambda_i}/k_i!}{\Lambda^N e^{-\Lambda}/N!} = \binom{N}{k_1,\dots,k_k}\prod\left(\frac{\lambda_i}{\Lambda}\right)^{k_i},$ which is the multinomial PMF.

Why It Matters

Superposition justifies pooling counts from independent sources with potentially different rates. Thinning justifies splitting a single count stream into independent sub-streams. The conditional-binomial result is what makes Pearson's Chi-squared test for cell counts valid: under the null hypothesis of independence, observed cell counts are conditionally multinomial with cell probabilities equal to row times column marginals. See chi-squared distribution and tests.

Failure Mode

All three results require independence. Two count streams that interact (the second is triggered by the first) are not the superposition of independent Poissons; their merged process is not Poisson. Thinning with state-dependent rates produces a non-Poisson type- $A$ count.

report a correction →

Maximum Likelihood Estimation

Theorem

MLE for the Poisson Rate

Statement

For an i.i.d. sample $X_1,\dots,X_n$ from $\operatorname{Pois}(\lambda)$ , the MLE is $\hat\lambda = \bar X_n = \frac{1}{n}\sum_{i=1}^n X_i.$ The MLE is unbiased and achieves the Cramer-Rao lower bound exactly: $\operatorname{Var}(\hat\lambda) = \lambda/n$ .

Intuition

The Poisson is a one-parameter exponential family with sufficient statistic $\sum X_i$ . The MLE of the natural parameter is the empirical mean of the sufficient statistic.

Proof Sketch

The log-likelihood is $\ell(\lambda) = \sum_{i=1}^n (X_i \log\lambda - \lambda - \log X_i!) = \log\lambda\sum X_i - n\lambda - \text{const}.$ Differentiating: $\ell'(\lambda) = (\sum X_i)/\lambda - n = 0$ , so $\hat\lambda = \bar X_n$ . The Fisher information per observation is $I(\lambda) = 1/\lambda$ , so the asymptotic variance is $\lambda/n$ . Direct computation: $\operatorname{Var}(\bar X_n) = \operatorname{Var}(X_i)/n = \lambda/n$ , matching the bound at every $n$ , not just asymptotically.

Why It Matters

The Poisson MLE is one of the few MLEs that achieves the Cramer-Rao bound exactly in finite samples. The asymptotic theory of MLEs is unnecessary here; the result holds at $n = 1$ . The estimator is unbiased, consistent, and efficient, which together is most of what point-estimation theory asks for. See maximum likelihood estimation for the general framework.

Failure Mode

The MLE assumes Poisson data. With overdispersed counts (variance exceeding mean), $\bar X_n$ is still consistent for the mean but the model is misspecified; standard errors based on $\hat\lambda/n$ underestimate the true sampling variance. The fix is a Negative Binomial regression or a Quasi-Poisson approach. See maximum likelihood estimation for the QMLE / sandwich-variance treatment of misspecification.

report a correction →

The Bayesian counterpart is the Gamma-Poisson conjugacy, which gives a closed-form posterior in gamma distribution.

Bridge to Exponential and Gamma

A rate- $\lambda$ Poisson process on $[0,\infty)$ has three equivalent characterizations:

The number of events in any interval of length $t$ is $\operatorname{Pois}(\lambda t)$ , with independence across disjoint intervals.
The inter-arrival times are i.i.d. $\operatorname{Exp}(\lambda)$ .
The waiting time for the $k$ -th event is $\operatorname{Gamma}(k,\lambda)$ .

Given (1), the second follows by computing the survival of the first inter-arrival: $\mathbb{P}(T_1 > t) = \mathbb{P}(N(t) = 0) = e^{-\lambda t}$ . Given (2), the third follows by Gamma additivity. The three characterizations are equivalent for "ordinary" point processes on the real line and are the standard way the Poisson process is introduced.

A consequence: the Poisson CDF at $k$ has a Gamma representation. For $N\sim\operatorname{Pois}(\lambda)$ , $\mathbb{P}(N \le k) = \mathbb{P}(T_{k+1} > 1) = \mathbb{P}(\operatorname{Gamma}(k+1,\lambda) > 1).$ This is what numerical libraries use to compute Poisson tail probabilities: the regularized incomplete Gamma function evaluates the Poisson CDF.

Overdispersion: When the Poisson Fails

Diagnostic	Poisson behavior	Real-data deviation	Better model
Sample variance versus sample mean	Equal in expectation	Variance much larger than mean	Negative Binomial
Empirical zero-rate	$e^{-\hat\lambda}$	More zeros than $e^{-\hat\lambda}$	Zero-Inflated Poisson
Per-group rates	Constant across groups	Rates vary by group	Mixed-effects Poisson
Clustered counts	Independent across units	Counts cluster	Compound Poisson

The diagnostic for overdispersion is the ratio of sample variance to sample mean. Under a true Poisson, the ratio is approximately one for large $n$ . Values significantly above one signal heterogeneity (the rate varies across observations) or clustering (counts come in bursts). The classical fix is to model the rate as Gamma-distributed across observations, giving the Negative Binomial.

Common Confusions

Watch Out

Poisson processes and Poisson distributions are not the same object

The Poisson distribution is a probability law on the integers. The Poisson process is a stochastic process on the real line (or higher-dimensional spaces) whose counts in any region are Poisson-distributed and independent across disjoint regions. Every Poisson process has Poisson-distributed counts, but the converse is not true: a count process whose counts are Poisson-distributed within each interval is not automatically a Poisson process if independence across intervals fails.

Watch Out

Mean equals variance is a property, not a fact about all count data

Real-world count data are rarely Poisson in the strict sense. Overdispersion is the norm. The Poisson model is a starting point and a useful approximation for low-rate independent events; it is not a universal count model. Always check the empirical variance-to-mean ratio before trusting Poisson standard errors.

Watch Out

The rate parameter is not the same in different parameterizations

A Poisson process with rate $\lambda$ events per second has counts in a one-minute window distributed as $\operatorname{Pois}(60\lambda)$ , not $\operatorname{Pois}(\lambda)$ . The unit of time is folded into the rate. Software libraries typically take a single $\lambda$ that is the expected count in the window of interest, so the unit of time is implicit. Always confirm which $\lambda$ the function expects.

Exercises

ExerciseCore

Problem

A website receives an average of 12 visitors per minute. Assuming Poisson arrivals, find the probability of receiving exactly 10 visitors in a randomly chosen minute and the probability of receiving more than 20 visitors.

ExerciseCore

Problem

Two independent type-A and type-B emails arrive at a server with rates $\lambda_A = 4$ per hour and $\lambda_B = 6$ per hour. Find the distribution of the total count in a one-hour window and the probability that, given the total is 15, exactly 6 are type A.

ExerciseAdvanced

Problem

Show that the sample variance $\hat\sigma^2_n = (1/n)\sum(X_i - \bar X_n)^2$ from an i.i.d. Poisson sample is a consistent but inefficient estimator of $\lambda$ , and identify a more efficient estimator that combines the sample mean and sample variance.

ExerciseAdvanced

Problem

Construct a 95% Wald confidence interval for $\lambda$ based on $\bar X_n$ . Then construct a 95% exact interval using the Gamma-Poisson relationship. Compare them at $n = 10$ and $\bar X_n = 0.5$ .

References

Canonical:

Casella and Berger, Statistical Inference (2002), Chapter 3 (Section 3.2 on the Poisson family), Chapter 7 (Poisson MLE), Chapter 10 (asymptotics).
Lehmann and Casella, Theory of Point Estimation (1998), Chapter 1 (exponential-family treatment of the Poisson).
Bickel and Doksum, Mathematical Statistics, Volume I (2015), Chapter 1 (Section 1.5).

Stochastic processes:

Ross, Introduction to Probability Models (2019), Chapter 5 (Poisson processes, thinning, superposition).
Kingman, Poisson Processes (1993), Chapters 1 and 2.
Grimmett and Stirzaker, Probability and Random Processes (2020), Chapter 6.

Overdispersion and count models:

McCullagh and Nelder, Generalized Linear Models (1989), Chapter 6 (Poisson regression and quasi-likelihood).
Cameron and Trivedi, Regression Analysis of Count Data (2013), Chapters 3 and 4.

Last reviewed: May 11, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

3

Common Probability Distributionslayer 0A · tier 1
Distributions Atlaslayer 0A · tier 1
Exponential Distributionlayer 0A · tier 1

Derived topics

0

No published topic currently declares this as a prerequisite.