Foundations
Poisson Distribution
The Poisson distribution as the rare-event limit of the Binomial and as the count law of a Poisson process: PMF, MGF, mean equals variance, additivity, thinning, superposition, MLE, and the connection to the Exponential and Gamma.
Why This Matters
The Poisson distribution is the law of "rare independent counts": the number of arrivals in a fixed window when each potential arrival is improbable, the trials are independent, and the rate of arrivals is roughly constant. Three independent threads converge on it:
- Limit of the Binomial. If and with , the Binomial distribution converges to Poisson. This is the rare-event derivation: count successes in a large number of nearly impossible trials.
- Count law of a Poisson process. For a Poisson process with rate on , the number of events in any interval of length is Poisson, independent across disjoint intervals. The Exponential distribution gives the inter-arrival times; the Poisson gives the counts.
- Maximum entropy on the nonnegative integers with fixed mean. Among all distributions on with mean , the Poisson is the one of maximum entropy. This is the information-theoretic anchor and the reason the Poisson appears as a default count model.
The mean equals the variance: . Real-world count data often has variance larger than the mean (overdispersion); when it does, the right model is a Negative Binomial or a Poisson-Gamma mixture, not a Poisson.
Definition
Poisson Distribution
A random variable has a Poisson distribution with rate if its PMF is
The support is the set of nonnegative integers. The mean and variance are both .
The parameter is interpreted as the expected number of events. The probability mass is unimodal at when is not an integer, and bimodal at and when is an integer.
Binomial-to-Poisson Limit
Rare-Event Limit (Poisson Limit Theorem)
Statement
Let with as . Then for every fixed ,
Intuition
The Binomial PMF has three factors. The binomial coefficient grows like . The factor is times , so the two cancel to give . The factor by the calculus identity .
Proof Sketch
Write where . Then The ratio as . The factor uses and . Multiplying gives .
Why It Matters
This is the classical justification for using a Poisson model when you have a large number of nearly impossible independent trials: defects on a manufactured chip, mutations along a long DNA sequence, hits on a server in a one-second window. The convergence is pointwise in , but it can be strengthened to total-variation convergence; the rate is in TV distance, which is the basis of Le Cam's Poisson-approximation theorem.
Failure Mode
The limit requires independence and constant per-trial probability . Real-world counts of rare events often violate one or both: hospital admissions cluster across patients with the same flu; defects cluster within a single manufacturing batch. When events are positively correlated, the Poisson under-disperses the data (the empirical variance exceeds the mean), and a Negative Binomial or compound Poisson is the right model.
MGF and Mean Equals Variance
Poisson MGF
Statement
For and every ,
Intuition
The exponential generating function of the PMF is the same as the MGF after the substitution . Identifying gives the result up to the normalization.
Proof Sketch
Why It Matters
Differentiating once at gives ; differentiating twice gives , so . The mean equals the variance, and this is a sharp diagnostic: if a count data set has empirical variance significantly larger than its mean, the Poisson model is misspecified.
Failure Mode
The Poisson MGF is finite for every , but only narrowly so: the log-MGF grows doubly exponentially in , which makes the Poisson sub-exponential rather than sub-Gaussian. Tail bounds for the Poisson are tighter than the generic sub-exponential bound; see Bennett's and Bernstein's inequalities.
Additivity, Thinning, and Superposition
Additivity, Thinning, and Superposition
Statement
- Additivity. If are independent, then .
- Conditional binomiality. Conditional on , the joint distribution of is multinomial with parameters and where .
- Thinning. If and each event is independently classified as type with probability and type with probability , then the type- count is , the type- count is , and the two are independent.
Intuition
Independent Poisson processes merge ("superposition") into a Poisson process whose rate is the sum. Splitting events of one Poisson process into types based on independent coin flips ("thinning") gives independent Poisson processes whose rates partition the original. The conditional-binomial statement is the discrete-time consequence of the same construction.
Proof Sketch
Additivity is the MGF argument: , the MGF of . Thinning follows from the same MGF argument applied to the marked process. The conditional-binomial statement is Bayes' rule on PMFs: which is the multinomial PMF.
Why It Matters
Superposition justifies pooling counts from independent sources with potentially different rates. Thinning justifies splitting a single count stream into independent sub-streams. The conditional-binomial result is what makes Pearson's Chi-squared test for cell counts valid: under the null hypothesis of independence, observed cell counts are conditionally multinomial with cell probabilities equal to row times column marginals. See chi-squared distribution and tests.
Failure Mode
All three results require independence. Two count streams that interact (the second is triggered by the first) are not the superposition of independent Poissons; their merged process is not Poisson. Thinning with state-dependent rates produces a non-Poisson type- count.
Maximum Likelihood Estimation
MLE for the Poisson Rate
Statement
For an i.i.d. sample from , the MLE is The MLE is unbiased and achieves the Cramer-Rao lower bound exactly: .
Intuition
The Poisson is a one-parameter exponential family with sufficient statistic . The MLE of the natural parameter is the empirical mean of the sufficient statistic.
Proof Sketch
The log-likelihood is Differentiating: , so . The Fisher information per observation is , so the asymptotic variance is . Direct computation: , matching the bound at every , not just asymptotically.
Why It Matters
The Poisson MLE is one of the few MLEs that achieves the Cramer-Rao bound exactly in finite samples. The asymptotic theory of MLEs is unnecessary here; the result holds at . The estimator is unbiased, consistent, and efficient, which together is most of what point-estimation theory asks for. See maximum likelihood estimation for the general framework.
Failure Mode
The MLE assumes Poisson data. With overdispersed counts (variance exceeding mean), is still consistent for the mean but the model is misspecified; standard errors based on underestimate the true sampling variance. The fix is a Negative Binomial regression or a Quasi-Poisson approach. See maximum likelihood estimation for the QMLE / sandwich-variance treatment of misspecification.
The Bayesian counterpart is the Gamma-Poisson conjugacy, which gives a closed-form posterior in gamma distribution.
Bridge to Exponential and Gamma
A rate- Poisson process on has three equivalent characterizations:
- The number of events in any interval of length is , with independence across disjoint intervals.
- The inter-arrival times are i.i.d. .
- The waiting time for the -th event is .
Given (1), the second follows by computing the survival of the first inter-arrival: . Given (2), the third follows by Gamma additivity. The three characterizations are equivalent for "ordinary" point processes on the real line and are the standard way the Poisson process is introduced.
A consequence: the Poisson CDF at has a Gamma representation. For , This is what numerical libraries use to compute Poisson tail probabilities: the regularized incomplete Gamma function evaluates the Poisson CDF.
Overdispersion: When the Poisson Fails
| Diagnostic | Poisson behavior | Real-data deviation | Better model |
|---|---|---|---|
| Sample variance versus sample mean | Equal in expectation | Variance much larger than mean | Negative Binomial |
| Empirical zero-rate | More zeros than | Zero-Inflated Poisson | |
| Per-group rates | Constant across groups | Rates vary by group | Mixed-effects Poisson |
| Clustered counts | Independent across units | Counts cluster | Compound Poisson |
The diagnostic for overdispersion is the ratio of sample variance to sample mean. Under a true Poisson, the ratio is approximately one for large . Values significantly above one signal heterogeneity (the rate varies across observations) or clustering (counts come in bursts). The classical fix is to model the rate as Gamma-distributed across observations, giving the Negative Binomial.
Common Confusions
Poisson processes and Poisson distributions are not the same object
The Poisson distribution is a probability law on the integers. The Poisson process is a stochastic process on the real line (or higher-dimensional spaces) whose counts in any region are Poisson-distributed and independent across disjoint regions. Every Poisson process has Poisson-distributed counts, but the converse is not true: a count process whose counts are Poisson-distributed within each interval is not automatically a Poisson process if independence across intervals fails.
Mean equals variance is a property, not a fact about all count data
Real-world count data are rarely Poisson in the strict sense. Overdispersion is the norm. The Poisson model is a starting point and a useful approximation for low-rate independent events; it is not a universal count model. Always check the empirical variance-to-mean ratio before trusting Poisson standard errors.
The rate parameter is not the same in different parameterizations
A Poisson process with rate events per second has counts in a one-minute window distributed as , not . The unit of time is folded into the rate. Software libraries typically take a single that is the expected count in the window of interest, so the unit of time is implicit. Always confirm which the function expects.
Exercises
Problem
A website receives an average of 12 visitors per minute. Assuming Poisson arrivals, find the probability of receiving exactly 10 visitors in a randomly chosen minute and the probability of receiving more than 20 visitors.
Problem
Two independent type-A and type-B emails arrive at a server with rates per hour and per hour. Find the distribution of the total count in a one-hour window and the probability that, given the total is 15, exactly 6 are type A.
Problem
Show that the sample variance from an i.i.d. Poisson sample is a consistent but inefficient estimator of , and identify a more efficient estimator that combines the sample mean and sample variance.
Problem
Construct a 95% Wald confidence interval for based on . Then construct a 95% exact interval using the Gamma-Poisson relationship. Compare them at and .
References
Canonical:
- Casella and Berger, Statistical Inference (2002), Chapter 3 (Section 3.2 on the Poisson family), Chapter 7 (Poisson MLE), Chapter 10 (asymptotics).
- Lehmann and Casella, Theory of Point Estimation (1998), Chapter 1 (exponential-family treatment of the Poisson).
- Bickel and Doksum, Mathematical Statistics, Volume I (2015), Chapter 1 (Section 1.5).
Stochastic processes:
- Ross, Introduction to Probability Models (2019), Chapter 5 (Poisson processes, thinning, superposition).
- Kingman, Poisson Processes (1993), Chapters 1 and 2.
- Grimmett and Stirzaker, Probability and Random Processes (2020), Chapter 6.
Overdispersion and count models:
- McCullagh and Nelder, Generalized Linear Models (1989), Chapter 6 (Poisson regression and quasi-likelihood).
- Cameron and Trivedi, Regression Analysis of Count Data (2013), Chapters 3 and 4.
Last reviewed: May 11, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
3- Common Probability Distributionslayer 0A · tier 1
- Distributions Atlaslayer 0A · tier 1
- Exponential Distributionlayer 0A · tier 1
Derived topics
0No published topic currently declares this as a prerequisite.