Exponential Distribution

Sneiderman, Robby

Foundations

Exponential Distribution

The Exponential distribution as the memoryless waiting time: density, CDF, MGF, the memoryless property as a characterization, the Poisson-process inter-arrival construction, the minimum of independent exponentials, MLE for the rate, and the bridge to the Gamma distribution.

CoreTier 1StableCore spine~50 min

Prerequisites

Common Probability Distributions Distributions Atlas Exponential Function Properties

Prereq Map

Why This Matters

The Exponential is the only continuous distribution on $[0,\infty)$ that is memoryless: conditional on surviving past any time $s$ , the remaining waiting time has the same distribution as the original. That single property determines the family up to its rate parameter, and it is the reason the Exponential is the canonical model for the time between events in a Poisson process. Once you have the Exponential, you have the Poisson process; once you have the Poisson process, you have the Poisson distribution for counts and the Gamma distribution for the time to the $k$ -th event.

The Exponential is also the maximum-entropy distribution on $[0,\infty)$ with a fixed mean, which makes it the right default model when all you know about a positive random variable is its mean. The downstream uses include reliability, queueing, survival analysis, and the noise model in some Bayesian regressions.

Definition

Exponential Distribution $X \sim Exp (λ)$

A random variable $X$ has an Exponential distribution with rate $\lambda > 0$ if it has density

$f_X(x) = \lambda e^{-\lambda x},\qquad x \ge 0,$

and zero density for $x < 0$ . The CDF is $F_X(x) = 1 - e^{-\lambda x}$ for $x \ge 0$ .

An equivalent parameterization uses scale $\theta = 1/\lambda$ : $f_X(x) = e^{-x/\theta}/\theta$ . The rate parameterization gives mean $1/\lambda$ ; the scale parameterization gives mean $\theta$ .

The two parameterizations describe the same family but differ in software conventions: SciPy and R survival packages tend to use scale, mathematical-statistics texts tend to use rate. Always check the convention before plugging in.

Moments and MGF

The mean of $X\sim\operatorname{Exp}(\lambda)$ is $1/\lambda$ and the variance is $1/\lambda^2$ . Both follow from the MGF.

Theorem

Exponential MGF

Statement

For $X\sim\operatorname{Exp}(\lambda)$ and $s < \lambda$ , $M_X(s) = \mathbb{E}[e^{sX}] = \frac{\lambda}{\lambda - s}.$ For $s \ge \lambda$ the MGF is infinite.

Intuition

The MGF is the Laplace transform of the density at $-s$ . The Laplace transform of $\lambda e^{-\lambda x}$ is finite exactly when the exponent $-(\lambda - s)$ is negative, that is, when $s < \lambda$ .

Proof Sketch

$M_X(s) = \int_0^\infty e^{sx}\lambda e^{-\lambda x}\,dx = \lambda\int_0^\infty e^{-(\lambda-s)x}\,dx = \frac{\lambda}{\lambda-s}\quad\text{for }s<\lambda.$ For $s\ge\lambda$ the integrand does not decay and the integral diverges.

Why It Matters

Differentiating twice and evaluating at zero gives $\mathbb{E}[X] = 1/\lambda$ and $\mathbb{E}[X^2] = 2/\lambda^2$ , so $\operatorname{Var}(X) = 1/\lambda^2$ . The MGF has a pole at $s = \lambda$ , which means the Exponential is light-tailed but not sub-Gaussian: it is sub-exponential. See sub-exponential random variables for the resulting tail bound.

Failure Mode

The MGF is finite only on the half-line $s<\lambda$ . Chernoff bounds for the Exponential must keep the parameter inside this region; pushing $s$ to $\lambda$ blows up the MGF and gives no useful bound. The Gamma and Chi-squared inherit this restriction because they are sums of Exponentials.

report a correction →

The Memoryless Property

Theorem

Memoryless Property

Statement

For $X\sim\operatorname{Exp}(\lambda)$ and every $s,t\ge 0$ , $\mathbb{P}(X > s + t \mid X > s) = \mathbb{P}(X > t).$

Intuition

A light bulb whose failure time is Exponential has no memory of how long it has been on. Given it has not failed by time $s$ , the distribution of the remaining time looks identical to the distribution of a fresh bulb.

Proof Sketch

By definition of conditional probability, $\mathbb{P}(X>s+t\mid X>s) = \frac{\mathbb{P}(X>s+t)}{\mathbb{P}(X>s)} = \frac{e^{-\lambda(s+t)}}{e^{-\lambda s}} = e^{-\lambda t} = \mathbb{P}(X>t).$

Why It Matters

The memoryless property is the defining feature of the Exponential family among continuous distributions on $[0,\infty)$ . It is what makes Poisson processes Markovian: the future does not depend on the past beyond the current time. It also makes Exponential models inappropriate when the hazard rate of the underlying process changes over time; in those cases use the Gamma distribution or a Weibull distribution.

Failure Mode

The memoryless property does not hold for the Gamma distribution with shape parameter different from one, the Lognormal, or the Weibull with shape parameter different from one. Reliability data with increasing or decreasing hazard rates should not be modeled by an Exponential; the wrong parametric family will systematically misprice tail risk.

report a correction →

Memorylessness Characterizes the Exponential

Theorem

Memoryless Characterization

Statement

If $X$ is a nonnegative continuous random variable that is memoryless and not identically zero, then there exists a unique $\lambda > 0$ with $X\sim\operatorname{Exp}(\lambda)$ .

Intuition

Memorylessness is the multiplicative functional equation $G(s+t) = G(s)G(t)$ for the survival function $G(x) = \mathbb{P}(X > x)$ . The only nonzero continuous solutions are the decaying exponentials, $G(x) = e^{-\lambda x}$ .

Proof Sketch

Let $G(x) = \mathbb{P}(X > x)$ . Memorylessness gives $G(s+t)/G(s) = G(t)$ , that is, $G(s+t) = G(s)G(t)$ . With $G$ right-continuous and $G(0) = 1$ (assuming $X$ is not almost surely zero), Cauchy's functional equation gives $G(x) = e^{-\lambda x}$ for some $\lambda$ . The condition that $G$ is a probability survival function ( $G$ nonincreasing, $G(\infty) = 0$ ) forces $\lambda > 0$ . This is $\operatorname{Exp}(\lambda)$ .

Why It Matters

The characterization is what makes "memoryless waiting time" a one-line argument for choosing the Exponential family: no other continuous distribution on $[0,\infty)$ has it. Combined with the discrete analogue (geometric is the only memoryless distribution on $\{1,2,\dots\}$ ), the result anchors a clean decision: if you have evidence the hazard rate is constant in time, the Exponential is forced; if not, do not use it.

Failure Mode

The characterization assumes continuity. Replacing it with general right-continuity allows for trivial distributions concentrated at zero. The discrete analogue with geometric distributions on $\{1,2,\dots\}$ uses a different functional equation; the two characterizations are parallel but not interchangeable.

report a correction →

Minimum of Independent Exponentials

Theorem

Minimum of Independent Exponentials

Statement

Let $X_i\sim\operatorname{Exp}(\lambda_i)$ for $i=1,\dots,n$ be independent. Then $M = \min(X_1,\dots,X_n) \sim \operatorname{Exp}\!\left(\sum_{i=1}^n \lambda_i\right),$ and $\mathbb{P}(M = X_j) = \frac{\lambda_j}{\sum_{i=1}^n \lambda_i},\qquad j=1,\dots,n.$

Intuition

The probability that no event has happened by time $t$ is the product of the survival probabilities, which is exponential in the sum of rates. The probability that event $j$ is the first equals the relative rate $\lambda_j/\sum\lambda_i$ , by symmetry of the joint density.

Proof Sketch

$\mathbb{P}(M > t) = \prod_{i=1}^n \mathbb{P}(X_i > t) = \prod_{i=1}^n e^{-\lambda_i t} = \exp\!\left(-t\sum_i \lambda_i\right),$ which is the survival function of $\operatorname{Exp}(\sum \lambda_i)$ . For the identity of the minimum, condition on $M = t$ and use the joint density $\prod \lambda_i e^{-\lambda_i x_i}$ to compute $\mathbb{P}(M = X_j)$ ; the result is $\lambda_j/\sum\lambda_i$ .

Why It Matters

This identity is what competing-risks models, queueing systems, and the Gillespie algorithm for simulating continuous-time Markov chains depend on. Each event clock is exponential; the first to fire determines the next state, and the time of the first firing is itself exponential.

Failure Mode

The result requires independence and exponential marginals. For dependent or non-exponential lifetimes, the minimum is not exponential and the relative-rate identification of which clock fires first fails. The classical competing-risks formula generalizes via cause-specific hazards, not by the elementary calculation here.

report a correction →

Connection to Poisson Process and Gamma

A Poisson process with rate $\lambda$ on $[0,\infty)$ has the following equivalent characterizations:

The number of events in any interval of length $t$ is $\operatorname{Pois}(\lambda t)$ .
The inter-arrival times are i.i.d. $\operatorname{Exp}(\lambda)$ .
The waiting time for the $k$ -th event is $\operatorname{Gamma}(k,\lambda)$ .

The second characterization is what makes the Exponential the canonical continuous model for "time between rare events with constant rate". The third bridges directly to the Gamma distribution by sum-of-Exponentials. See the proof of sum-of-i.i.d.-Exponentials-is-Gamma in distributions atlas.

Maximum Likelihood Estimation

Theorem

MLE for the Rate

Statement

Given an i.i.d. sample $X_1,\dots,X_n$ from $\operatorname{Exp}(\lambda)$ , the MLE is $\hat\lambda = \frac{n}{\sum_{i=1}^n X_i} = \frac{1}{\bar X_n}.$ The MLE for the scale parameterization is $\hat\theta = \bar X_n$ .

Intuition

The log-likelihood is concave in $\lambda$ with a single critical point. The MLE for the rate is the reciprocal of the sample mean; the MLE for the mean is the sample mean.

Proof Sketch

The log-likelihood is $\ell(\lambda) = n\log\lambda - \lambda\sum_{i=1}^n X_i.$ Differentiating: $\ell'(\lambda) = n/\lambda - \sum X_i = 0$ gives $\hat\lambda = n/\sum X_i$ . The second derivative $-n/\lambda^2$ is negative, confirming a maximum.

Why It Matters

$\hat\lambda$ is biased upward in finite samples: $\mathbb{E}[\hat\lambda] = n\lambda/(n-1)$ for $n\ge 2$ , computed from the fact that $n/(n\bar X_n) = 1/\bar X_n$ and $n\bar X_n \sim \operatorname{Gamma}(n,\lambda)$ has expected reciprocal $\lambda n/(n-1)$ . The bias-corrected estimator is $(n-1)/(n\bar X_n)$ . For large $n$ the bias is negligible.

Failure Mode

The MLE is undefined if every $X_i = 0$ , which happens with probability zero for continuous data but can happen with quantized or censored data. Survival analysis with censoring requires a modified likelihood; see survival analysis.

report a correction →

The Fisher information per observation in the rate parameterization is $I(\lambda) = 1/\lambda^2$ , so the asymptotic variance of $\hat\lambda$ is $\lambda^2/n$ . The Cramer-Rao lower bound is achieved asymptotically by the MLE.

Sample Output

Quantity	Formula	Numerical example, $\lambda = 0.5$
Mean	$1/\lambda$	$2.0$
Variance	$1/\lambda^2$	$4.0$
Median	$\log 2/\lambda$	$\approx 1.386$
95th percentile	$-\log(0.05)/\lambda$	$\approx 5.99$
99th percentile	$-\log(0.01)/\lambda$	$\approx 9.21$

The median is smaller than the mean because the distribution is right-skewed: most of the mass is concentrated near zero, with a long tail.

Common Confusions

Watch Out

Rate versus scale parameterization

$\operatorname{Exp}(\lambda) = \operatorname{Exp}(\text{rate}=\lambda)$ in rate notation has mean $1/\lambda$ . $\operatorname{Exp}(\theta) = \operatorname{Exp}(\text{scale}=\theta)$ in scale notation has mean $\theta$ . They are the same family with $\theta = 1/\lambda$ , but plugging $\lambda = 0.5$ into a scale-parameterization library gives mean $0.5$ , not $2$ . Read the docstring before reading the result.

Watch Out

Memoryless is not the same as light-tailed

The Exponential is light-tailed in the sense that the MGF is finite on a half-line. Memorylessness is a separate property. The Gamma and Chi-squared are still light-tailed but are not memoryless; their hazard rates depend on time.

Watch Out

The minimum is exponential, the sum is not

The minimum of independent Exponentials is exponential with summed rate. The sum is Gamma, not exponential. The maximum is neither; its survival function is $1 - \prod(1 - e^{-\lambda_i t})$ , which has no closed name except as the "generalized maximum-of-exponentials" distribution.

Watch Out

Hazard rate constant means constant, not exact

The Exponential has hazard rate exactly $\lambda$ at every time. Empirical hazard estimates from real data are noisy; a rolling estimate that wobbles around a constant value is consistent with Exponential lifetimes, but a steady upward or downward trend is not, regardless of the average level.

Exercises

ExerciseCore

Problem

Customers arrive at a service desk according to a Poisson process with rate $\lambda = 6$ per hour. Find the probability that the time until the next customer arrives exceeds 15 minutes.

ExerciseCore

Problem

Let $X\sim\operatorname{Exp}(\lambda)$ . Show that $aX\sim\operatorname{Exp}(\lambda/a)$ for every $a > 0$ .

ExerciseAdvanced

Problem

Let $X_1,X_2$ be independent with $X_i\sim\operatorname{Exp}(\lambda_i)$ . Find the density of the maximum $M = \max(X_1,X_2)$ .

ExerciseAdvanced

Problem

Show that if $X\sim\operatorname{Exp}(\lambda)$ then $Y = -\log(1 - F_X(X)) = \lambda X$ is $\operatorname{Exp}(1)$ . (This is the probability integral transform.)

ExerciseResearch

Problem

Show that the MLE $\hat\lambda = 1/\bar X_n$ is asymptotically Normal: $\sqrt n(\hat\lambda - \lambda)\to\mathcal{N}(0, \lambda^2)$ as $n\to\infty$ . Identify the role of the Fisher information.

References

Canonical:

Casella and Berger, Statistical Inference (2002), Chapter 3 (Section 3.3 introduces the family), Chapter 7 (Section 7.2 covers Exponential MLE).
Lehmann and Casella, Theory of Point Estimation (1998), Chapter 1 (sufficiency for the Exponential and the connection to one-parameter exponential families).
Ross, Introduction to Probability Models (2019), Chapter 5 (memoryless property and Poisson process construction).

Probability:

Blitzstein and Hwang, Introduction to Probability (2019), Chapter 5.
Durrett, Probability: Theory and Examples (2019), Chapter 2 (Section 2.5 on Poisson processes).
Grimmett and Stirzaker, Probability and Random Processes (2020), Chapter 6 (Poisson processes and renewal theory).

Last reviewed: May 11, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

3

Common Probability Distributionslayer 0A · tier 1
Distributions Atlaslayer 0A · tier 1
Exponential Function Propertieslayer 0A · tier 1

Derived topics

2

Gamma Distributionlayer 0A · tier 1
Poisson Distributionlayer 0A · tier 1

Graph-backed continuations

Gamma Distribution Poisson Distribution