Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Foundations

Moment Generating Functions

The moment generating function M(t) = E[e^{tX}] encodes all moments of a distribution. The Chernoff method, sub-Gaussian bounds, and exponential family theory all reduce to MGF conditions.

CoreTier 2Stable~35 min
0

Why This Matters

The moment generating function is the main workhorse connecting light-tailed probability distributions to concentration inequalities. When the MGF exists in a neighborhood of zero (sub-Gaussian, sub-exponential, bounded), it powers the Chernoff method, exponential tilting, and the sub-Gaussian / sub-exponential machinery. When it does not exist (Cauchy, heavy-tailed power laws), the characteristic function is the strictly more general tool, and concentration results require different machinery (Chebyshev, truncation, Nemirovski-style norms).

The Chernoff method is: apply Markov's inequality to etXe^{tX} and optimize over tt. This is an MGF computation.

A random variable is sub-Gaussian with parameter σ\sigma if and only if its MGF satisfies M(t)eσ2t2/2M(t) \leq e^{\sigma^2 t^2 / 2}. Within the sub-Gaussian world, the entire concentration story reduces to bounding the MGF.

Exponential families are distributions whose density is exp(θTT(x)A(θ))\exp(\theta^T T(x) - A(\theta)), relying on properties of the exponential function. The function A(θ)A(\theta) is the log-MGF of the sufficient statistic.

Core Definitions

Definition

Moment Generating Function

The moment generating function of a random variable XX is:

MX(t)=E[etX]M_X(t) = \mathbb{E}[e^{tX}]

defined for all tRt \in \mathbb{R} where this expectation is finite. The MGF may not exist for all tt; when it exists in an open interval around 0, it determines the distribution uniquely.

Extracting Moments

The kk-th moment of XX is the kk-th derivative of MXM_X at zero:

E[Xk]=MX(k)(0)\mathbb{E}[X^k] = M_X^{(k)}(0)

This follows from differentiating under the expectation:

MX(k)(t)=E[XketX]M_X^{(k)}(t) = \mathbb{E}[X^k e^{tX}]

and evaluating at t=0t = 0. The interchange of differentiation and expectation is valid when MXM_X exists in a neighborhood of tt.

In particular: E[X]=MX(0)\mathbb{E}[X] = M_X'(0) and E[X2]=MX(0)\mathbb{E}[X^2] = M_X''(0).

Main Theorems

Theorem

MGF Uniqueness Theorem

Statement

If two random variables XX and YY have moment generating functions MX(t)M_X(t) and MY(t)M_Y(t) that are finite and equal for all tt in some open interval (δ,δ)(-\delta, \delta) with δ>0\delta > 0, then XX and YY have the same distribution.

Intuition

The MGF encodes the entire distribution, not just the moments. If two distributions agree on their MGFs in a neighborhood of zero, they must be the same distribution. This is stronger than moment matching: there exist distinct distributions with identical moments of all orders, but they cannot have identical MGFs in a neighborhood of zero.

Proof Sketch

The MGF is related to the characteristic function φX(t)=E[eitX]\varphi_X(t) = \mathbb{E}[e^{itX}] by analytic continuation. If MX(t)M_X(t) is finite on (δ,δ)(-\delta, \delta), the characteristic function extends analytically to a strip in the complex plane. By the uniqueness theorem for characteristic functions (Levy inversion), the distribution is determined.

Why It Matters

This theorem justifies the "MGF technique" for identifying distributions. If you compute the MGF of a sum of independent Gaussians and recognize it as the MGF of another Gaussian, you can conclude the sum is Gaussian. This approach is cleaner than convolution arguments.

Failure Mode

The MGF must exist in a neighborhood of 0, not just at 0 (where it always equals 1). Heavy-tailed distributions like the Cauchy distribution have no MGF. For those distributions, use the characteristic function instead, which always exists.

Proposition

MGF of Independent Sum

Statement

If XX and YY are independent random variables whose MGFs exist, then:

MX+Y(t)=MX(t)MY(t)M_{X+Y}(t) = M_X(t) \cdot M_Y(t)

Intuition

Independence means E[g(X)h(Y)]=E[g(X)]E[h(Y)]\mathbb{E}[g(X)h(Y)] = \mathbb{E}[g(X)] \mathbb{E}[h(Y)]. Apply this with g(X)=etXg(X) = e^{tX} and h(Y)=etYh(Y) = e^{tY}.

Proof Sketch

MX+Y(t)=E[et(X+Y)]=E[etXetY]=E[etX]E[etY]=MX(t)MY(t)M_{X+Y}(t) = \mathbb{E}[e^{t(X+Y)}] = \mathbb{E}[e^{tX} e^{tY}] = \mathbb{E}[e^{tX}] \mathbb{E}[e^{tY}] = M_X(t) M_Y(t), where the third equality uses independence.

Why It Matters

This is why MGFs are the natural tool for sums of independent variables. Addition of random variables corresponds to multiplication of MGFs. Taking logs: the cumulant generating function logMX(t)\log M_X(t) is additive for independent sums.

Failure Mode

Fails without independence. For dependent variables, E[etXetY]E[etX]E[etY]\mathbb{E}[e^{tX}e^{tY}] \neq \mathbb{E}[e^{tX}]\mathbb{E}[e^{tY}] in general.

Canonical Examples

Example

MGF of a Gaussian

Let XN(μ,σ2)X \sim N(\mu, \sigma^2). Then:

MX(t)=E[etX]=exp(μt+σ2t22)M_X(t) = \mathbb{E}[e^{tX}] = \exp\left(\mu t + \frac{\sigma^2 t^2}{2}\right)

This exists for all tRt \in \mathbb{R}. Setting μ=0\mu = 0: MX(t)=exp(σ2t2/2)M_X(t) = \exp(\sigma^2 t^2 / 2), which is the defining condition for sub-Gaussian random variables with parameter σ\sigma.

Example

The Chernoff method in one line

For any t>0t > 0: P(Xa)=P(etXeta)etaMX(t)P(X \geq a) = P(e^{tX} \geq e^{ta}) \leq e^{-ta} M_X(t). The first step is monotonicity of exp\exp; the second is Markov's inequality. Optimize over t>0t > 0 to get the tightest bound. This is the entire Chernoff method.

Common Confusions

Watch Out

Moments existing does not imply MGF exists

A distribution can have all moments finite yet have no MGF. The lognormal distribution has E[Xk]<\mathbb{E}[X^k] < \infty for all kk but MX(t)=M_X(t) = \infty for all t>0t > 0. The MGF is a stronger condition than having all moments.

Watch Out

MGF vs characteristic function vs cumulant generating function

The MGF is MX(t)=E[etX]M_X(t) = \mathbb{E}[e^{tX}]. The characteristic function is φX(t)=E[eitX]\varphi_X(t) = \mathbb{E}[e^{itX}], which always exists. The cumulant generating function (CGF) is KX(t)=logMX(t)K_X(t) = \log M_X(t). The CGF is additive for independent sums. In concentration inequality proofs, you work with the CGF.

Exercises

ExerciseCore

Problem

Compute the MGF of a Bernoulli(pp) random variable. Use it to find E[X]\mathbb{E}[X] and E[X2]\mathbb{E}[X^2].

ExerciseAdvanced

Problem

Let X1,,XnX_1, \ldots, X_n be i.i.d. N(0,1)N(0, 1). Use MGFs to prove that Xˉ=1ni=1nXi\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i is distributed as N(0,1/n)N(0, 1/n).

References

Canonical:

  • Casella & Berger, Statistical Inference (2002), Chapter 2.3
  • Billingsley, Probability and Measure (1995), Section 21

Current:

  • Wainwright, High-Dimensional Statistics (2019), Chapters 2-3 (MGFs in concentration)
  • Vershynin, High-Dimensional Probability (2018), Chapter 2 (sub-Gaussian MGF condition)

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Next Topics