Discrete and Continuous Distribution Pairs

Sneiderman, Robby

Foundations

Discrete and Continuous Distribution Pairs

A reference page mapping the canonical discrete-and-continuous companion pairs. Discrete uniform and continuous uniform; Bernoulli/Binomial counting process and the Poisson process; geometric and exponential waiting times (both memoryless); negative binomial and gamma (time to r-th event); hypergeometric (sampling without replacement, no clean continuous analog). Each pair is shown side by side with definition, PMF or PDF, support, mean, variance, MGF, memorylessness check, and conceptual bridge.

ImportantCoreTier 2StableReference~30 min

For:StatsActuarialGeneral

Prerequisites

Common Probability Distributions Random Variables

Prereq Map

Why This Matters

Several discrete distributions have natural continuous-time analogs. Knowing the pairs cleans up two sources of confusion: which distribution belongs in which problem (count-the-trials vs. wait-for-the-time), and why the same algebraic identities (memorylessness, sum-to-form-a-bigger-member) keep showing up. The pairing is also a useful sanity check: if a discrete-time analysis and a continuous-time analysis disagree about the answer to the same question, one of them is making a modeling error.

This page is a reference. The individual distributions are covered in detail on common probability distributions; the value here is the side-by-side view.

The Pairing at a Glance

Discrete	Continuous	Bridge	Both memoryless?
Discrete uniform on $\{1, \ldots, n\}$	Uniform on $[a, b]$	"No information" maximum-entropy on a finite or bounded set	No
Bernoulli / Binomial counting	Poisson process on $[0, T]$	Rare-event limit: many trials with small per-trial success rate	n/a (counting processes)
Geometric on $\{1, 2, \ldots\}$	Exponential on $(0, \infty)$	Waiting time to the first success / event	Yes (the only memoryless families on their supports)
Negative binomial (sum of $r$ Geometrics)	Gamma with integer shape $r$ (sum of $r$ Exponentials)	Waiting time to the $r$ -th success / event	No
Hypergeometric (sample without replacement)	No clean continuous analog	Sampling without replacement has no rate-based continuous version	n/a

Uniform: Discrete and Continuous

Property	Discrete Uniform on $\{1, \ldots, n\}$	Continuous Uniform on $[a, b]$
PMF / PDF	$P(X = k) = 1/n$ for $k = 1, \ldots, n$	$f_X(x) = 1/(b - a)$ for $x \in [a, b]$
CDF	$F(k) = k/n$ for $k = 1, \ldots, n$	$F(x) = (x - a)/(b - a)$ for $x \in [a, b]$
Mean	$(n + 1)/2$	$(a + b)/2$
Variance	$(n^2 - 1)/12$	$(b - a)^2 / 12$
MGF	$\dfrac{e^t - e^{(n+1)t}}{n(1 - e^t)}$	$\dfrac{e^{tb} - e^{ta}}{t(b - a)}$
Memorylessness	No (bounded support)	No (bounded support)

The factor of $1/12$ in the variance is the same constant in both expressions; this is not coincidence. The continuous uniform is the limit of the discrete uniform on $\{a, a + (b - a)/n, \ldots, b\}$ as $n \to \infty$ , and the variance of the limit matches the limit of the variances.

Bernoulli/Binomial Counting versus the Poisson Process

Property	Binomial counting on $n$ trials	Poisson process on $[0, T]$
What it counts	Number of successes in $n$ iid Bernoulli( $p$ ) trials	Number of arrivals in a continuous-time interval with rate $\lambda$
Distribution of the count	$X \sim \text{Bin}(n, p)$	$N(T) \sim \text{Poisson}(\lambda T)$
Mean	$np$	$\lambda T$
Variance	$np(1 - p)$	$\lambda T$
MGF	$(1 - p + p e^t)^n$	$\exp(\lambda T (e^t - 1))$
Increment independence	Successes in disjoint trial sets are independent	Arrivals in disjoint time intervals are independent
Limit relationship	$n \to \infty$ , $p \to 0$ , $np \to \lambda T$ gives Poisson	The Poisson process is the rare-event continuous-time limit

The rare-event limit is the bridge: as the number of opportunities for an event grows and the per-opportunity probability shrinks at the right rate, the binomial PMF $\binom{n}{k} p^k (1 - p)^{n - k}$ converges to the Poisson PMF $e^{-\lambda} \lambda^k / k!$ . The variance of the binomial is $np(1 - p) \to \lambda T (1 - 0) = \lambda T$ , matching the Poisson variance.

Theorem

Poisson Limit of the Binomial

Statement

If $X_n \sim \text{Bin}(n, p_n)$ with $n p_n \to \lambda > 0$ as $n \to \infty$ , then for every fixed $k \in \{0, 1, 2, \ldots\}$ , $P(X_n = k) \to e^{-\lambda} \frac{\lambda^k}{k!} = P(\text{Poisson}(\lambda) = k).$

Proof Sketch

Write $P(X_n = k) = \binom{n}{k} p_n^k (1 - p_n)^{n - k} = \frac{n(n - 1)\cdots(n - k + 1)}{k!} \cdot p_n^k \cdot (1 - p_n)^{n - k}.$ The product $n(n - 1)\cdots(n - k + 1) \cdot p_n^k = (n p_n)^k \cdot \prod_{j = 0}^{k - 1} (1 - j/n) \to \lambda^k \cdot 1$ . And $(1 - p_n)^{n - k} = \exp((n - k) \log(1 - p_n)) \to \exp(-\lambda)$ since $\log(1 - p_n) \approx -p_n$ . Combining, $P(X_n = k) \to e^{-\lambda} \lambda^k / k!$ .

Why It Matters

This justifies modeling rare-event counts (insurance claims per month, network packet drops per second, mutations per generation) as Poisson rather than binomial: when the underlying trial count is large and the per-trial rate is small, the Poisson approximation is both convenient (one parameter instead of two) and accurate.

Failure Mode

The approximation requires $p_n$ to be small and $n p_n$ to be a moderate constant. For fixed $p$ bounded away from zero, the binomial does not converge to Poisson but rather to a Gaussian (after standardization, by the CLT). The two limits handle different regimes.

report a correction →

Geometric and Exponential: The Memoryless Pair

Property	Geometric on $\{1, 2, \ldots\}$	Exponential on $(0, \infty)$
What it measures	Number of trials until first success in iid Bernoulli( $p$ )	Time until first event in a rate- $\lambda$ Poisson process
PMF / PDF	$P(X = k) = (1 - p)^{k - 1} p$ for $k \geq 1$	$f_X(x) = \lambda e^{-\lambda x}$ for $x \geq 0$
CDF	$F(k) = 1 - (1 - p)^k$ for $k \geq 1$	$F(x) = 1 - e^{-\lambda x}$ for $x \geq 0$
Mean	$1 / p$	$1 / \lambda$
Variance	$(1 - p) / p^2$	$1 / \lambda^2$
MGF	$\dfrac{p e^t}{1 - (1 - p) e^t}$ for $t < -\log(1 - p)$	$\dfrac{\lambda}{\lambda - t}$ for $t < \lambda$
Memorylessness	$P(X > k + j \mid X > k) = P(X > j)$ for integer $k, j \geq 0$	$P(X > t + s \mid X > t) = P(X > s)$ for real $t, s \geq 0$

Theorem

Geometric and Exponential Are the Unique Memoryless Distributions

Statement

On the positive integers, the geometric distribution is the unique distribution satisfying $P(X > k + j \mid X > k) = P(X > j)$ for all $j, k \geq 0$ . On the positive reals, the exponential distribution is the unique continuous distribution satisfying $P(X > t + s \mid X > t) = P(X > s)$ for all $t, s \geq 0$ .

Proof Sketch

Exponential (continuous case). Let $\bar F(t) = P(X > t)$ . Memorylessness gives $\bar F(t + s) = \bar F(t) \bar F(s)$ . The only right-continuous solutions on $[0, \infty)$ with $\bar F(0) = 1$ are of the form $\bar F(t) = e^{-\lambda t}$ for some $\lambda > 0$ (a classical Cauchy functional-equation result). Differentiating gives the exponential density.

Geometric (discrete case). Let $q_k = P(X > k)$ . Memorylessness gives $q_{k + j} = q_k q_j$ . Setting $q = q_1$ gives $q_k = q^k$ , so $P(X > k) = q^k$ and $P(X = k) = q^{k - 1}(1 - q)$ . Identifying $1 - q$ as the success probability $p$ recovers the geometric.

Why It Matters

Memorylessness is the conceptual bridge between the two distributions. The continuous-time analog of "waiting one more trial after $k$ failed trials" is "waiting one more time unit after $t$ time units of no event"; both inherit the same lack-of-memory property because the underlying processes (Bernoulli trials, Poisson process) have independent increments.

Concretely: if calls arrive at a help line as a rate- $\lambda$ Poisson process, the time you have already waited gives you no information about how much longer until the next call. The same is true for the next Bernoulli success after a string of failures. Any modeling that requires the waiting-time distribution to "remember" the past must use something other than geometric or exponential.

Failure Mode

The memoryless property is exact, not approximate. Mixture models (geometric with random $p$ , exponential with random $\lambda$ ) are not memoryless. Truncated exponential distributions are not memoryless. Any failure of independent increments in the underlying process breaks memorylessness.

report a correction →

Negative Binomial and Gamma: Waiting for the r-th Event

Property	Negative Binomial $(r, p)$	Gamma $(r, \lambda)$ with integer shape $r$
What it measures	Number of trials until $r$ -th success in iid Bernoulli( $p$ )	Time until $r$ -th event in a rate- $\lambda$ Poisson process
PMF / PDF	$P(X = k) = \binom{k - 1}{r - 1} p^r (1 - p)^{k - r}$ for $k \geq r$	$f_X(x) = \dfrac{\lambda^r x^{r - 1} e^{-\lambda x}}{(r - 1)!}$ for $x \geq 0$
Mean	$r / p$	$r / \lambda$
Variance	$r(1 - p)/p^2$	$r / \lambda^2$
Sum-of-independent decomposition	NB $(r, p)$ is the sum of $r$ iid Geometric $(p)$	Gamma $(r, \lambda)$ is the sum of $r$ iid Exponential $(\lambda)$
Memorylessness	No (the wait for the $r$ -th event remembers prior failed trials when $r > 1$ )	No

The negative binomial (geometric counterpart) inherits its construction from summing independent geometrics. The gamma (exponential counterpart) inherits its construction from summing independent exponentials. Memorylessness is lost as soon as $r \geq 2$ because conditioning on having waited a long time without all $r$ events changes the distribution of the remaining wait.

The gamma distribution also has a non-integer shape parameter generalization (replacing $(r - 1)!$ with the gamma function $\Gamma(r)$ ); the negative binomial similarly extends to non-integer $r$ via the gamma function, where it is most useful as a Poisson distribution with gamma-distributed rate (a mixture, not a waiting-time distribution).

Hypergeometric: No Clean Continuous Analog

The hypergeometric distribution models sampling without replacement: from a population of $N$ items with $K$ successes, draw $n$ items, and count the number of successes drawn. Its PMF is $P(X = k) = \frac{\binom{K}{k}\binom{N - K}{n - k}}{\binom{N}{n}}.$ There is no clean continuous-time analog. The continuous version of "sampling without replacement" would be "drawing without re-sampling from a finite continuous population", which is not a standard probability construct. The hypergeometric appears in survey sampling, finite-population inference, and Fisher's exact test for $2 \times 2$ contingency tables; in each setting, the continuous-population analog has different machinery (regression, asymptotic chi-squared, etc.).

Worth knowing: as $N \to \infty$ with $K / N \to p$ , the hypergeometric converges to the binomial. So the "no continuous analog" gap is bridged by the binomial-Poisson rare-event chain in the large-population limit.

Common Confusions

Watch Out

Geometric supports vary by convention

The two standard conventions are $X \in \{1, 2, \ldots\}$ (counting the trial of the first success) and $X \in \{0, 1, 2, \ldots\}$ (counting the failures before the first success). The mean is $1/p$ in the first convention and $(1 - p)/p$ in the second. Always state which version is in use; the formulas differ by a constant shift.

Watch Out

The negative binomial has at least two equally common parameterizations

The shape parameter $r$ can count trials until the $r$ -th success (PMF supported on $\{r, r + 1, \ldots\}$ ) or failures before the $r$ -th success (PMF supported on $\{0, 1, 2, \ldots\}$ ). When $r$ is non-integer, the distribution is typically presented in the mean-dispersion parameterization $(\mu, k)$ used in GLM software. Match the convention to the source.

Watch Out

Memorylessness is about the residual lifetime, not the marginal distribution

The exponential is memoryless, but a sum of two iid exponentials (a gamma with shape $2$ ) is not memoryless. The reason: knowing you have waited some time without seeing both events tells you which event has happened (if any), and changes the conditional distribution of the remaining wait. Memorylessness is a fragile property and only survives in the single-event special case.

Watch Out

The Poisson process is more than the Poisson distribution

"Poisson distribution" is a one-parameter family on $\{0, 1, 2, \ldots\}$ describing the count of events. "Poisson process" is a continuous-time stochastic process whose count over any interval is Poisson and whose increments over disjoint intervals are independent. The two are tightly related but conceptually distinct; many of the discrete-continuous pairings above are pairings of stochastic processes (Bernoulli trial sequence vs. Poisson process), with the marginal distributions falling out as derived quantities.

Exercises

ExerciseCore

Problem

Verify that an exponential random variable $X \sim \text{Exp}(\lambda)$ satisfies the memoryless property $P(X > t + s \mid X > t) = P(X > s)$ by direct computation.

ExerciseCore

Problem

Let $X_1, X_2$ be iid Geometric( $p$ ) on $\{1, 2, \ldots\}$ . Show that $Y = X_1 + X_2$ has the negative binomial PMF $P(Y = k) = (k - 1) p^2 (1 - p)^{k - 2}$ for $k \geq 2$ .

ExerciseAdvanced

Problem

Let $T_1, T_2, \ldots$ be iid Exponential( $\lambda$ ) inter-arrival times and let $N(T)$ be the number of arrivals in $[0, T]$ . Show that $N(T) \sim \text{Poisson}(\lambda T)$ by computing $P(N(T) = k)$ via the gamma distribution of the partial sums.

References

Canonical:

Casella and Berger, Statistical Inference (2002), 2nd edition, Chapter 3 (discrete) and Chapter 3 (continuous)
Blitzstein and Hwang, Introduction to Probability (2019), 2nd edition, Chapters 4 (discrete), 5 (continuous), and 13 (Poisson processes)
Durrett, Probability: Theory and Examples (2019), 5th edition, Chapter 4 (Poisson processes)

Distribution catalogs:

Johnson, Kotz, and Kemp, Univariate Discrete Distributions (2005), 3rd edition
Johnson, Kotz, and Balakrishnan, Continuous Univariate Distributions, Volume 1 (1994), 2nd edition

Applied / process view:

Ross, Introduction to Probability Models (2014), 11th edition, Chapter 5 (Poisson processes)
Pinsky and Karlin, An Introduction to Stochastic Modeling (2011), 4th edition

Next Topics

Central limit theorem: the universal continuous limit of normalized sums of discrete or continuous distributions.
Law of large numbers: the matched-pair partner of the CLT; applies equally to discrete and continuous distributions.
Method of moments: the estimation strategy that uses moments directly from the tables on this page.

Last reviewed: May 12, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

2

Common Probability Distributionslayer 0A · tier 1
Random Variableslayer 0A · tier 1

Derived topics

0

No published topic currently declares this as a prerequisite.