Skip to main content

Foundations

Exponential Distribution

The Exponential distribution as the memoryless waiting time: density, CDF, MGF, the memoryless property as a characterization, the Poisson-process inter-arrival construction, the minimum of independent exponentials, MLE for the rate, and the bridge to the Gamma distribution.

CoreTier 1StableCore spine~50 min

Why This Matters

The Exponential is the only continuous distribution on [0,)[0,\infty) that is memoryless: conditional on surviving past any time ss, the remaining waiting time has the same distribution as the original. That single property determines the family up to its rate parameter, and it is the reason the Exponential is the canonical model for the time between events in a Poisson process. Once you have the Exponential, you have the Poisson process; once you have the Poisson process, you have the Poisson distribution for counts and the Gamma distribution for the time to the kk-th event.

The Exponential is also the maximum-entropy distribution on [0,)[0,\infty) with a fixed mean, which makes it the right default model when all you know about a positive random variable is its mean. The downstream uses include reliability, queueing, survival analysis, and the noise model in some Bayesian regressions.

Definition

Definition

Exponential Distribution

A random variable XX has an Exponential distribution with rate λ>0\lambda > 0 if it has density

fX(x)=λeλx,x0,f_X(x) = \lambda e^{-\lambda x},\qquad x \ge 0,

and zero density for x<0x < 0. The CDF is FX(x)=1eλxF_X(x) = 1 - e^{-\lambda x} for x0x \ge 0.

An equivalent parameterization uses scale θ=1/λ\theta = 1/\lambda: fX(x)=ex/θ/θf_X(x) = e^{-x/\theta}/\theta. The rate parameterization gives mean 1/λ1/\lambda; the scale parameterization gives mean θ\theta.

The two parameterizations describe the same family but differ in software conventions: SciPy and R survival packages tend to use scale, mathematical-statistics texts tend to use rate. Always check the convention before plugging in.

Moments and MGF

The mean of XExp(λ)X\sim\operatorname{Exp}(\lambda) is 1/λ1/\lambda and the variance is 1/λ21/\lambda^2. Both follow from the MGF.

Theorem

Exponential MGF

Statement

For XExp(λ)X\sim\operatorname{Exp}(\lambda) and s<λs < \lambda, MX(s)=E[esX]=λλs.M_X(s) = \mathbb{E}[e^{sX}] = \frac{\lambda}{\lambda - s}. For sλs \ge \lambda the MGF is infinite.

Intuition

The MGF is the Laplace transform of the density at s-s. The Laplace transform of λeλx\lambda e^{-\lambda x} is finite exactly when the exponent (λs)-(\lambda - s) is negative, that is, when s<λs < \lambda.

Proof Sketch

MX(s)=0esxλeλxdx=λ0e(λs)xdx=λλsfor s<λ.M_X(s) = \int_0^\infty e^{sx}\lambda e^{-\lambda x}\,dx = \lambda\int_0^\infty e^{-(\lambda-s)x}\,dx = \frac{\lambda}{\lambda-s}\quad\text{for }s<\lambda. For sλs\ge\lambda the integrand does not decay and the integral diverges.

Why It Matters

Differentiating twice and evaluating at zero gives E[X]=1/λ\mathbb{E}[X] = 1/\lambda and E[X2]=2/λ2\mathbb{E}[X^2] = 2/\lambda^2, so Var(X)=1/λ2\operatorname{Var}(X) = 1/\lambda^2. The MGF has a pole at s=λs = \lambda, which means the Exponential is light-tailed but not sub-Gaussian: it is sub-exponential. See sub-exponential random variables for the resulting tail bound.

Failure Mode

The MGF is finite only on the half-line s<λs<\lambda. Chernoff bounds for the Exponential must keep the parameter inside this region; pushing ss to λ\lambda blows up the MGF and gives no useful bound. The Gamma and Chi-squared inherit this restriction because they are sums of Exponentials.

The Memoryless Property

Theorem

Memoryless Property

Statement

For XExp(λ)X\sim\operatorname{Exp}(\lambda) and every s,t0s,t\ge 0, P(X>s+tX>s)=P(X>t).\mathbb{P}(X > s + t \mid X > s) = \mathbb{P}(X > t).

Intuition

A light bulb whose failure time is Exponential has no memory of how long it has been on. Given it has not failed by time ss, the distribution of the remaining time looks identical to the distribution of a fresh bulb.

Proof Sketch

By definition of conditional probability, P(X>s+tX>s)=P(X>s+t)P(X>s)=eλ(s+t)eλs=eλt=P(X>t).\mathbb{P}(X>s+t\mid X>s) = \frac{\mathbb{P}(X>s+t)}{\mathbb{P}(X>s)} = \frac{e^{-\lambda(s+t)}}{e^{-\lambda s}} = e^{-\lambda t} = \mathbb{P}(X>t).

Why It Matters

The memoryless property is the defining feature of the Exponential family among continuous distributions on [0,)[0,\infty). It is what makes Poisson processes Markovian: the future does not depend on the past beyond the current time. It also makes Exponential models inappropriate when the hazard rate of the underlying process changes over time; in those cases use the Gamma distribution or a Weibull distribution.

Failure Mode

The memoryless property does not hold for the Gamma distribution with shape parameter different from one, the Lognormal, or the Weibull with shape parameter different from one. Reliability data with increasing or decreasing hazard rates should not be modeled by an Exponential; the wrong parametric family will systematically misprice tail risk.

Memorylessness Characterizes the Exponential

Theorem

Memoryless Characterization

Statement

If XX is a nonnegative continuous random variable that is memoryless and not identically zero, then there exists a unique λ>0\lambda > 0 with XExp(λ)X\sim\operatorname{Exp}(\lambda).

Intuition

Memorylessness is the multiplicative functional equation G(s+t)=G(s)G(t)G(s+t) = G(s)G(t) for the survival function G(x)=P(X>x)G(x) = \mathbb{P}(X > x). The only nonzero continuous solutions are the decaying exponentials, G(x)=eλxG(x) = e^{-\lambda x}.

Proof Sketch

Let G(x)=P(X>x)G(x) = \mathbb{P}(X > x). Memorylessness gives G(s+t)/G(s)=G(t)G(s+t)/G(s) = G(t), that is, G(s+t)=G(s)G(t)G(s+t) = G(s)G(t). With GG right-continuous and G(0)=1G(0) = 1 (assuming XX is not almost surely zero), Cauchy's functional equation gives G(x)=eλxG(x) = e^{-\lambda x} for some λ\lambda. The condition that GG is a probability survival function (GG nonincreasing, G()=0G(\infty) = 0) forces λ>0\lambda > 0. This is Exp(λ)\operatorname{Exp}(\lambda).

Why It Matters

The characterization is what makes "memoryless waiting time" a one-line argument for choosing the Exponential family: no other continuous distribution on [0,)[0,\infty) has it. Combined with the discrete analogue (geometric is the only memoryless distribution on {1,2,}\{1,2,\dots\}), the result anchors a clean decision: if you have evidence the hazard rate is constant in time, the Exponential is forced; if not, do not use it.

Failure Mode

The characterization assumes continuity. Replacing it with general right-continuity allows for trivial distributions concentrated at zero. The discrete analogue with geometric distributions on {1,2,}\{1,2,\dots\} uses a different functional equation; the two characterizations are parallel but not interchangeable.

Minimum of Independent Exponentials

Theorem

Minimum of Independent Exponentials

Statement

Let XiExp(λi)X_i\sim\operatorname{Exp}(\lambda_i) for i=1,,ni=1,\dots,n be independent. Then M=min(X1,,Xn)Exp ⁣(i=1nλi),M = \min(X_1,\dots,X_n) \sim \operatorname{Exp}\!\left(\sum_{i=1}^n \lambda_i\right), and P(M=Xj)=λji=1nλi,j=1,,n.\mathbb{P}(M = X_j) = \frac{\lambda_j}{\sum_{i=1}^n \lambda_i},\qquad j=1,\dots,n.

Intuition

The probability that no event has happened by time tt is the product of the survival probabilities, which is exponential in the sum of rates. The probability that event jj is the first equals the relative rate λj/λi\lambda_j/\sum\lambda_i, by symmetry of the joint density.

Proof Sketch

P(M>t)=i=1nP(Xi>t)=i=1neλit=exp ⁣(tiλi),\mathbb{P}(M > t) = \prod_{i=1}^n \mathbb{P}(X_i > t) = \prod_{i=1}^n e^{-\lambda_i t} = \exp\!\left(-t\sum_i \lambda_i\right), which is the survival function of Exp(λi)\operatorname{Exp}(\sum \lambda_i). For the identity of the minimum, condition on M=tM = t and use the joint density λieλixi\prod \lambda_i e^{-\lambda_i x_i} to compute P(M=Xj)\mathbb{P}(M = X_j); the result is λj/λi\lambda_j/\sum\lambda_i.

Why It Matters

This identity is what competing-risks models, queueing systems, and the Gillespie algorithm for simulating continuous-time Markov chains depend on. Each event clock is exponential; the first to fire determines the next state, and the time of the first firing is itself exponential.

Failure Mode

The result requires independence and exponential marginals. For dependent or non-exponential lifetimes, the minimum is not exponential and the relative-rate identification of which clock fires first fails. The classical competing-risks formula generalizes via cause-specific hazards, not by the elementary calculation here.

Connection to Poisson Process and Gamma

A Poisson process with rate λ\lambda on [0,)[0,\infty) has the following equivalent characterizations:

  1. The number of events in any interval of length tt is Pois(λt)\operatorname{Pois}(\lambda t).
  2. The inter-arrival times are i.i.d. Exp(λ)\operatorname{Exp}(\lambda).
  3. The waiting time for the kk-th event is Gamma(k,λ)\operatorname{Gamma}(k,\lambda).

The second characterization is what makes the Exponential the canonical continuous model for "time between rare events with constant rate". The third bridges directly to the Gamma distribution by sum-of-Exponentials. See the proof of sum-of-i.i.d.-Exponentials-is-Gamma in distributions atlas.

Maximum Likelihood Estimation

Theorem

MLE for the Rate

Statement

Given an i.i.d. sample X1,,XnX_1,\dots,X_n from Exp(λ)\operatorname{Exp}(\lambda), the MLE is λ^=ni=1nXi=1Xˉn.\hat\lambda = \frac{n}{\sum_{i=1}^n X_i} = \frac{1}{\bar X_n}. The MLE for the scale parameterization is θ^=Xˉn\hat\theta = \bar X_n.

Intuition

The log-likelihood is concave in λ\lambda with a single critical point. The MLE for the rate is the reciprocal of the sample mean; the MLE for the mean is the sample mean.

Proof Sketch

The log-likelihood is (λ)=nlogλλi=1nXi.\ell(\lambda) = n\log\lambda - \lambda\sum_{i=1}^n X_i. Differentiating: (λ)=n/λXi=0\ell'(\lambda) = n/\lambda - \sum X_i = 0 gives λ^=n/Xi\hat\lambda = n/\sum X_i. The second derivative n/λ2-n/\lambda^2 is negative, confirming a maximum.

Why It Matters

λ^\hat\lambda is biased upward in finite samples: E[λ^]=nλ/(n1)\mathbb{E}[\hat\lambda] = n\lambda/(n-1) for n2n\ge 2, computed from the fact that n/(nXˉn)=1/Xˉnn/(n\bar X_n) = 1/\bar X_n and nXˉnGamma(n,λ)n\bar X_n \sim \operatorname{Gamma}(n,\lambda) has expected reciprocal λn/(n1)\lambda n/(n-1). The bias-corrected estimator is (n1)/(nXˉn)(n-1)/(n\bar X_n). For large nn the bias is negligible.

Failure Mode

The MLE is undefined if every Xi=0X_i = 0, which happens with probability zero for continuous data but can happen with quantized or censored data. Survival analysis with censoring requires a modified likelihood; see survival analysis.

The Fisher information per observation in the rate parameterization is I(λ)=1/λ2I(\lambda) = 1/\lambda^2, so the asymptotic variance of λ^\hat\lambda is λ2/n\lambda^2/n. The Cramer-Rao lower bound is achieved asymptotically by the MLE.

Sample Output

QuantityFormulaNumerical example, λ=0.5\lambda = 0.5
Mean1/λ1/\lambda2.02.0
Variance1/λ21/\lambda^24.04.0
Medianlog2/λ\log 2/\lambda1.386\approx 1.386
95th percentilelog(0.05)/λ-\log(0.05)/\lambda5.99\approx 5.99
99th percentilelog(0.01)/λ-\log(0.01)/\lambda9.21\approx 9.21

The median is smaller than the mean because the distribution is right-skewed: most of the mass is concentrated near zero, with a long tail.

Common Confusions

Watch Out

Rate versus scale parameterization

Exp(λ)=Exp(rate=λ)\operatorname{Exp}(\lambda) = \operatorname{Exp}(\text{rate}=\lambda) in rate notation has mean 1/λ1/\lambda. Exp(θ)=Exp(scale=θ)\operatorname{Exp}(\theta) = \operatorname{Exp}(\text{scale}=\theta) in scale notation has mean θ\theta. They are the same family with θ=1/λ\theta = 1/\lambda, but plugging λ=0.5\lambda = 0.5 into a scale-parameterization library gives mean 0.50.5, not 22. Read the docstring before reading the result.

Watch Out

Memoryless is not the same as light-tailed

The Exponential is light-tailed in the sense that the MGF is finite on a half-line. Memorylessness is a separate property. The Gamma and Chi-squared are still light-tailed but are not memoryless; their hazard rates depend on time.

Watch Out

The minimum is exponential, the sum is not

The minimum of independent Exponentials is exponential with summed rate. The sum is Gamma, not exponential. The maximum is neither; its survival function is 1(1eλit)1 - \prod(1 - e^{-\lambda_i t}), which has no closed name except as the "generalized maximum-of-exponentials" distribution.

Watch Out

Hazard rate constant means constant, not exact

The Exponential has hazard rate exactly λ\lambda at every time. Empirical hazard estimates from real data are noisy; a rolling estimate that wobbles around a constant value is consistent with Exponential lifetimes, but a steady upward or downward trend is not, regardless of the average level.

Exercises

ExerciseCore

Problem

Customers arrive at a service desk according to a Poisson process with rate λ=6\lambda = 6 per hour. Find the probability that the time until the next customer arrives exceeds 15 minutes.

ExerciseCore

Problem

Let XExp(λ)X\sim\operatorname{Exp}(\lambda). Show that aXExp(λ/a)aX\sim\operatorname{Exp}(\lambda/a) for every a>0a > 0.

ExerciseAdvanced

Problem

Let X1,X2X_1,X_2 be independent with XiExp(λi)X_i\sim\operatorname{Exp}(\lambda_i). Find the density of the maximum M=max(X1,X2)M = \max(X_1,X_2).

ExerciseAdvanced

Problem

Show that if XExp(λ)X\sim\operatorname{Exp}(\lambda) then Y=log(1FX(X))=λXY = -\log(1 - F_X(X)) = \lambda X is Exp(1)\operatorname{Exp}(1). (This is the probability integral transform.)

ExerciseResearch

Problem

Show that the MLE λ^=1/Xˉn\hat\lambda = 1/\bar X_n is asymptotically Normal: n(λ^λ)N(0,λ2)\sqrt n(\hat\lambda - \lambda)\to\mathcal{N}(0, \lambda^2) as nn\to\infty. Identify the role of the Fisher information.

References

Canonical:

  • Casella and Berger, Statistical Inference (2002), Chapter 3 (Section 3.3 introduces the family), Chapter 7 (Section 7.2 covers Exponential MLE).
  • Lehmann and Casella, Theory of Point Estimation (1998), Chapter 1 (sufficiency for the Exponential and the connection to one-parameter exponential families).
  • Ross, Introduction to Probability Models (2019), Chapter 5 (memoryless property and Poisson process construction).

Probability:

  • Blitzstein and Hwang, Introduction to Probability (2019), Chapter 5.
  • Durrett, Probability: Theory and Examples (2019), Chapter 2 (Section 2.5 on Poisson processes).
  • Grimmett and Stirzaker, Probability and Random Processes (2020), Chapter 6 (Poisson processes and renewal theory).

Last reviewed: May 11, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

3

Derived topics

2

Graph-backed continuations