Skip to main content

Foundations

Discrete and Continuous Distribution Pairs

A reference page mapping the canonical discrete-and-continuous companion pairs. Discrete uniform and continuous uniform; Bernoulli/Binomial counting process and the Poisson process; geometric and exponential waiting times (both memoryless); negative binomial and gamma (time to r-th event); hypergeometric (sampling without replacement, no clean continuous analog). Each pair is shown side by side with definition, PMF or PDF, support, mean, variance, MGF, memorylessness check, and conceptual bridge.

ImportantCoreTier 2StableReference~30 min
For:StatsActuarialGeneral

Why This Matters

Several discrete distributions have natural continuous-time analogs. Knowing the pairs cleans up two sources of confusion: which distribution belongs in which problem (count-the-trials vs. wait-for-the-time), and why the same algebraic identities (memorylessness, sum-to-form-a-bigger-member) keep showing up. The pairing is also a useful sanity check: if a discrete-time analysis and a continuous-time analysis disagree about the answer to the same question, one of them is making a modeling error.

This page is a reference. The individual distributions are covered in detail on common probability distributions; the value here is the side-by-side view.

The Pairing at a Glance

DiscreteContinuousBridgeBoth memoryless?
Discrete uniform on {1,,n}\{1, \ldots, n\}Uniform on [a,b][a, b]"No information" maximum-entropy on a finite or bounded setNo
Bernoulli / Binomial countingPoisson process on [0,T][0, T]Rare-event limit: many trials with small per-trial success raten/a (counting processes)
Geometric on {1,2,}\{1, 2, \ldots\}Exponential on (0,)(0, \infty)Waiting time to the first success / eventYes (the only memoryless families on their supports)
Negative binomial (sum of rr Geometrics)Gamma with integer shape rr (sum of rr Exponentials)Waiting time to the rr-th success / eventNo
Hypergeometric (sample without replacement)No clean continuous analogSampling without replacement has no rate-based continuous versionn/a

Uniform: Discrete and Continuous

PropertyDiscrete Uniform on {1,,n}\{1, \ldots, n\}Continuous Uniform on [a,b][a, b]
PMF / PDFP(X=k)=1/nP(X = k) = 1/n for k=1,,nk = 1, \ldots, nfX(x)=1/(ba)f_X(x) = 1/(b - a) for x[a,b]x \in [a, b]
CDFF(k)=k/nF(k) = k/n for k=1,,nk = 1, \ldots, nF(x)=(xa)/(ba)F(x) = (x - a)/(b - a) for x[a,b]x \in [a, b]
Mean(n+1)/2(n + 1)/2(a+b)/2(a + b)/2
Variance(n21)/12(n^2 - 1)/12(ba)2/12(b - a)^2 / 12
MGFete(n+1)tn(1et)\dfrac{e^t - e^{(n+1)t}}{n(1 - e^t)}etbetat(ba)\dfrac{e^{tb} - e^{ta}}{t(b - a)}
MemorylessnessNo (bounded support)No (bounded support)

The factor of 1/121/12 in the variance is the same constant in both expressions; this is not coincidence. The continuous uniform is the limit of the discrete uniform on {a,a+(ba)/n,,b}\{a, a + (b - a)/n, \ldots, b\} as nn \to \infty, and the variance of the limit matches the limit of the variances.

Bernoulli/Binomial Counting versus the Poisson Process

PropertyBinomial counting on nn trialsPoisson process on [0,T][0, T]
What it countsNumber of successes in nn iid Bernoulli(pp) trialsNumber of arrivals in a continuous-time interval with rate λ\lambda
Distribution of the countXBin(n,p)X \sim \text{Bin}(n, p)N(T)Poisson(λT)N(T) \sim \text{Poisson}(\lambda T)
MeannpnpλT\lambda T
Variancenp(1p)np(1 - p)λT\lambda T
MGF(1p+pet)n(1 - p + p e^t)^nexp(λT(et1))\exp(\lambda T (e^t - 1))
Increment independenceSuccesses in disjoint trial sets are independentArrivals in disjoint time intervals are independent
Limit relationshipnn \to \infty, p0p \to 0, npλTnp \to \lambda T gives PoissonThe Poisson process is the rare-event continuous-time limit

The rare-event limit is the bridge: as the number of opportunities for an event grows and the per-opportunity probability shrinks at the right rate, the binomial PMF (nk)pk(1p)nk\binom{n}{k} p^k (1 - p)^{n - k} converges to the Poisson PMF eλλk/k!e^{-\lambda} \lambda^k / k!. The variance of the binomial is np(1p)λT(10)=λTnp(1 - p) \to \lambda T (1 - 0) = \lambda T, matching the Poisson variance.

Theorem

Poisson Limit of the Binomial

Statement

If XnBin(n,pn)X_n \sim \text{Bin}(n, p_n) with npnλ>0n p_n \to \lambda > 0 as nn \to \infty, then for every fixed k{0,1,2,}k \in \{0, 1, 2, \ldots\}, P(Xn=k)eλλkk!=P(Poisson(λ)=k).P(X_n = k) \to e^{-\lambda} \frac{\lambda^k}{k!} = P(\text{Poisson}(\lambda) = k).

Proof Sketch

Write P(Xn=k)=(nk)pnk(1pn)nk=n(n1)(nk+1)k!pnk(1pn)nk.P(X_n = k) = \binom{n}{k} p_n^k (1 - p_n)^{n - k} = \frac{n(n - 1)\cdots(n - k + 1)}{k!} \cdot p_n^k \cdot (1 - p_n)^{n - k}. The product n(n1)(nk+1)pnk=(npn)kj=0k1(1j/n)λk1n(n - 1)\cdots(n - k + 1) \cdot p_n^k = (n p_n)^k \cdot \prod_{j = 0}^{k - 1} (1 - j/n) \to \lambda^k \cdot 1. And (1pn)nk=exp((nk)log(1pn))exp(λ)(1 - p_n)^{n - k} = \exp((n - k) \log(1 - p_n)) \to \exp(-\lambda) since log(1pn)pn\log(1 - p_n) \approx -p_n. Combining, P(Xn=k)eλλk/k!P(X_n = k) \to e^{-\lambda} \lambda^k / k!.

Why It Matters

This justifies modeling rare-event counts (insurance claims per month, network packet drops per second, mutations per generation) as Poisson rather than binomial: when the underlying trial count is large and the per-trial rate is small, the Poisson approximation is both convenient (one parameter instead of two) and accurate.

Failure Mode

The approximation requires pnp_n to be small and npnn p_n to be a moderate constant. For fixed pp bounded away from zero, the binomial does not converge to Poisson but rather to a Gaussian (after standardization, by the CLT). The two limits handle different regimes.

Geometric and Exponential: The Memoryless Pair

PropertyGeometric on {1,2,}\{1, 2, \ldots\}Exponential on (0,)(0, \infty)
What it measuresNumber of trials until first success in iid Bernoulli(pp)Time until first event in a rate-λ\lambda Poisson process
PMF / PDFP(X=k)=(1p)k1pP(X = k) = (1 - p)^{k - 1} p for k1k \geq 1fX(x)=λeλxf_X(x) = \lambda e^{-\lambda x} for x0x \geq 0
CDFF(k)=1(1p)kF(k) = 1 - (1 - p)^k for k1k \geq 1F(x)=1eλxF(x) = 1 - e^{-\lambda x} for x0x \geq 0
Mean1/p1 / p1/λ1 / \lambda
Variance(1p)/p2(1 - p) / p^21/λ21 / \lambda^2
MGFpet1(1p)et\dfrac{p e^t}{1 - (1 - p) e^t} for t<log(1p)t < -\log(1 - p)λλt\dfrac{\lambda}{\lambda - t} for t<λt < \lambda
MemorylessnessP(X>k+jX>k)=P(X>j)P(X > k + j \mid X > k) = P(X > j) for integer k,j0k, j \geq 0P(X>t+sX>t)=P(X>s)P(X > t + s \mid X > t) = P(X > s) for real t,s0t, s \geq 0
Theorem

Geometric and Exponential Are the Unique Memoryless Distributions

Statement

On the positive integers, the geometric distribution is the unique distribution satisfying P(X>k+jX>k)=P(X>j)P(X > k + j \mid X > k) = P(X > j) for all j,k0j, k \geq 0. On the positive reals, the exponential distribution is the unique continuous distribution satisfying P(X>t+sX>t)=P(X>s)P(X > t + s \mid X > t) = P(X > s) for all t,s0t, s \geq 0.

Proof Sketch

Exponential (continuous case). Let Fˉ(t)=P(X>t)\bar F(t) = P(X > t). Memorylessness gives Fˉ(t+s)=Fˉ(t)Fˉ(s)\bar F(t + s) = \bar F(t) \bar F(s). The only right-continuous solutions on [0,)[0, \infty) with Fˉ(0)=1\bar F(0) = 1 are of the form Fˉ(t)=eλt\bar F(t) = e^{-\lambda t} for some λ>0\lambda > 0 (a classical Cauchy functional-equation result). Differentiating gives the exponential density.

Geometric (discrete case). Let qk=P(X>k)q_k = P(X > k). Memorylessness gives qk+j=qkqjq_{k + j} = q_k q_j. Setting q=q1q = q_1 gives qk=qkq_k = q^k, so P(X>k)=qkP(X > k) = q^k and P(X=k)=qk1(1q)P(X = k) = q^{k - 1}(1 - q). Identifying 1q1 - q as the success probability pp recovers the geometric.

Why It Matters

Memorylessness is the conceptual bridge between the two distributions. The continuous-time analog of "waiting one more trial after kk failed trials" is "waiting one more time unit after tt time units of no event"; both inherit the same lack-of-memory property because the underlying processes (Bernoulli trials, Poisson process) have independent increments.

Concretely: if calls arrive at a help line as a rate-λ\lambda Poisson process, the time you have already waited gives you no information about how much longer until the next call. The same is true for the next Bernoulli success after a string of failures. Any modeling that requires the waiting-time distribution to "remember" the past must use something other than geometric or exponential.

Failure Mode

The memoryless property is exact, not approximate. Mixture models (geometric with random pp, exponential with random λ\lambda) are not memoryless. Truncated exponential distributions are not memoryless. Any failure of independent increments in the underlying process breaks memorylessness.

Negative Binomial and Gamma: Waiting for the r-th Event

PropertyNegative Binomial(r,p)(r, p)Gamma(r,λ)(r, \lambda) with integer shape rr
What it measuresNumber of trials until rr-th success in iid Bernoulli(pp)Time until rr-th event in a rate-λ\lambda Poisson process
PMF / PDFP(X=k)=(k1r1)pr(1p)krP(X = k) = \binom{k - 1}{r - 1} p^r (1 - p)^{k - r} for krk \geq rfX(x)=λrxr1eλx(r1)!f_X(x) = \dfrac{\lambda^r x^{r - 1} e^{-\lambda x}}{(r - 1)!} for x0x \geq 0
Meanr/pr / pr/λr / \lambda
Variancer(1p)/p2r(1 - p)/p^2r/λ2r / \lambda^2
Sum-of-independent decompositionNB(r,p)(r, p) is the sum of rr iid Geometric(p)(p)Gamma(r,λ)(r, \lambda) is the sum of rr iid Exponential(λ)(\lambda)
MemorylessnessNo (the wait for the rr-th event remembers prior failed trials when r>1r > 1)No

The negative binomial (geometric counterpart) inherits its construction from summing independent geometrics. The gamma (exponential counterpart) inherits its construction from summing independent exponentials. Memorylessness is lost as soon as r2r \geq 2 because conditioning on having waited a long time without all rr events changes the distribution of the remaining wait.

The gamma distribution also has a non-integer shape parameter generalization (replacing (r1)!(r - 1)! with the gamma function Γ(r)\Gamma(r)); the negative binomial similarly extends to non-integer rr via the gamma function, where it is most useful as a Poisson distribution with gamma-distributed rate (a mixture, not a waiting-time distribution).

Hypergeometric: No Clean Continuous Analog

The hypergeometric distribution models sampling without replacement: from a population of NN items with KK successes, draw nn items, and count the number of successes drawn. Its PMF is P(X=k)=(Kk)(NKnk)(Nn).P(X = k) = \frac{\binom{K}{k}\binom{N - K}{n - k}}{\binom{N}{n}}. There is no clean continuous-time analog. The continuous version of "sampling without replacement" would be "drawing without re-sampling from a finite continuous population", which is not a standard probability construct. The hypergeometric appears in survey sampling, finite-population inference, and Fisher's exact test for 2×22 \times 2 contingency tables; in each setting, the continuous-population analog has different machinery (regression, asymptotic chi-squared, etc.).

Worth knowing: as NN \to \infty with K/NpK / N \to p, the hypergeometric converges to the binomial. So the "no continuous analog" gap is bridged by the binomial-Poisson rare-event chain in the large-population limit.

Common Confusions

Watch Out

Geometric supports vary by convention

The two standard conventions are X{1,2,}X \in \{1, 2, \ldots\} (counting the trial of the first success) and X{0,1,2,}X \in \{0, 1, 2, \ldots\} (counting the failures before the first success). The mean is 1/p1/p in the first convention and (1p)/p(1 - p)/p in the second. Always state which version is in use; the formulas differ by a constant shift.

Watch Out

The negative binomial has at least two equally common parameterizations

The shape parameter rr can count trials until the rr-th success (PMF supported on {r,r+1,}\{r, r + 1, \ldots\}) or failures before the rr-th success (PMF supported on {0,1,2,}\{0, 1, 2, \ldots\}). When rr is non-integer, the distribution is typically presented in the mean-dispersion parameterization (μ,k)(\mu, k) used in GLM software. Match the convention to the source.

Watch Out

Memorylessness is about the residual lifetime, not the marginal distribution

The exponential is memoryless, but a sum of two iid exponentials (a gamma with shape 22) is not memoryless. The reason: knowing you have waited some time without seeing both events tells you which event has happened (if any), and changes the conditional distribution of the remaining wait. Memorylessness is a fragile property and only survives in the single-event special case.

Watch Out

The Poisson process is more than the Poisson distribution

"Poisson distribution" is a one-parameter family on {0,1,2,}\{0, 1, 2, \ldots\} describing the count of events. "Poisson process" is a continuous-time stochastic process whose count over any interval is Poisson and whose increments over disjoint intervals are independent. The two are tightly related but conceptually distinct; many of the discrete-continuous pairings above are pairings of stochastic processes (Bernoulli trial sequence vs. Poisson process), with the marginal distributions falling out as derived quantities.

Exercises

ExerciseCore

Problem

Verify that an exponential random variable XExp(λ)X \sim \text{Exp}(\lambda) satisfies the memoryless property P(X>t+sX>t)=P(X>s)P(X > t + s \mid X > t) = P(X > s) by direct computation.

ExerciseCore

Problem

Let X1,X2X_1, X_2 be iid Geometric(pp) on {1,2,}\{1, 2, \ldots\}. Show that Y=X1+X2Y = X_1 + X_2 has the negative binomial PMF P(Y=k)=(k1)p2(1p)k2P(Y = k) = (k - 1) p^2 (1 - p)^{k - 2} for k2k \geq 2.

ExerciseAdvanced

Problem

Let T1,T2,T_1, T_2, \ldots be iid Exponential(λ\lambda) inter-arrival times and let N(T)N(T) be the number of arrivals in [0,T][0, T]. Show that N(T)Poisson(λT)N(T) \sim \text{Poisson}(\lambda T) by computing P(N(T)=k)P(N(T) = k) via the gamma distribution of the partial sums.

References

Canonical:

  • Casella and Berger, Statistical Inference (2002), 2nd edition, Chapter 3 (discrete) and Chapter 3 (continuous)
  • Blitzstein and Hwang, Introduction to Probability (2019), 2nd edition, Chapters 4 (discrete), 5 (continuous), and 13 (Poisson processes)
  • Durrett, Probability: Theory and Examples (2019), 5th edition, Chapter 4 (Poisson processes)

Distribution catalogs:

  • Johnson, Kotz, and Kemp, Univariate Discrete Distributions (2005), 3rd edition
  • Johnson, Kotz, and Balakrishnan, Continuous Univariate Distributions, Volume 1 (1994), 2nd edition

Applied / process view:

  • Ross, Introduction to Probability Models (2014), 11th edition, Chapter 5 (Poisson processes)
  • Pinsky and Karlin, An Introduction to Stochastic Modeling (2011), 4th edition

Next Topics

  • Central limit theorem: the universal continuous limit of normalized sums of discrete or continuous distributions.
  • Law of large numbers: the matched-pair partner of the CLT; applies equally to discrete and continuous distributions.
  • Method of moments: the estimation strategy that uses moments directly from the tables on this page.

Last reviewed: May 12, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

2

Derived topics

0

No published topic currently declares this as a prerequisite.