Foundations
Moment Generating Functions
The moment generating function M(t) = E[e^{tX}] encodes all moments of a distribution. The Chernoff method, sub-Gaussian bounds, and exponential family theory all reduce to MGF conditions.
Prerequisites
Why This Matters
The moment generating function is the main workhorse connecting light-tailed probability distributions to concentration inequalities. When the MGF exists in a neighborhood of zero (sub-Gaussian, sub-exponential, bounded), it powers the Chernoff method, exponential tilting, and the sub-Gaussian / sub-exponential machinery. When it does not exist (Cauchy, heavy-tailed power laws), the characteristic function is the strictly more general tool, and concentration results require different machinery (Chebyshev, truncation, Nemirovski-style norms).
The Chernoff method is: apply Markov's inequality to and optimize over . This is an MGF computation.
A random variable is sub-Gaussian with parameter if and only if its MGF satisfies . Within the sub-Gaussian world, the entire concentration story reduces to bounding the MGF.
Exponential families are distributions whose density is , relying on properties of the exponential function. The function is the log-MGF of the sufficient statistic.
Core Definitions
Moment Generating Function
The moment generating function of a random variable is:
defined for all where this expectation is finite. The MGF may not exist for all ; when it exists in an open interval around 0, it determines the distribution uniquely.
Extracting Moments
The -th moment of is the -th derivative of at zero:
This follows from differentiating under the expectation:
and evaluating at . The interchange of differentiation and expectation is valid when exists in a neighborhood of .
In particular: and .
Main Theorems
MGF Uniqueness Theorem
Statement
If two random variables and have moment generating functions and that are finite and equal for all in some open interval with , then and have the same distribution.
Intuition
The MGF encodes the entire distribution, not just the moments. If two distributions agree on their MGFs in a neighborhood of zero, they must be the same distribution. This is stronger than moment matching: there exist distinct distributions with identical moments of all orders, but they cannot have identical MGFs in a neighborhood of zero.
Proof Sketch
The MGF is related to the characteristic function by analytic continuation. If is finite on , the characteristic function extends analytically to a strip in the complex plane. By the uniqueness theorem for characteristic functions (Levy inversion), the distribution is determined.
Why It Matters
This theorem justifies the "MGF technique" for identifying distributions. If you compute the MGF of a sum of independent Gaussians and recognize it as the MGF of another Gaussian, you can conclude the sum is Gaussian. This approach is cleaner than convolution arguments.
Failure Mode
The MGF must exist in a neighborhood of 0, not just at 0 (where it always equals 1). Heavy-tailed distributions like the Cauchy distribution have no MGF. For those distributions, use the characteristic function instead, which always exists.
MGF of Independent Sum
Statement
If and are independent random variables whose MGFs exist, then:
Intuition
Independence means . Apply this with and .
Proof Sketch
, where the third equality uses independence.
Why It Matters
This is why MGFs are the natural tool for sums of independent variables. Addition of random variables corresponds to multiplication of MGFs. Taking logs: the cumulant generating function is additive for independent sums.
Failure Mode
Fails without independence. For dependent variables, in general.
Canonical Examples
MGF of a Gaussian
Let . Then:
This exists for all . Setting : , which is the defining condition for sub-Gaussian random variables with parameter .
The Chernoff method in one line
For any : . The first step is monotonicity of ; the second is Markov's inequality. Optimize over to get the tightest bound. This is the entire Chernoff method.
Common Confusions
Moments existing does not imply MGF exists
A distribution can have all moments finite yet have no MGF. The lognormal distribution has for all but for all . The MGF is a stronger condition than having all moments.
MGF vs characteristic function vs cumulant generating function
The MGF is . The characteristic function is , which always exists. The cumulant generating function (CGF) is . The CGF is additive for independent sums. In concentration inequality proofs, you work with the CGF.
Exercises
Problem
Compute the MGF of a Bernoulli() random variable. Use it to find and .
Problem
Let be i.i.d. . Use MGFs to prove that is distributed as .
References
Canonical:
- Casella & Berger, Statistical Inference (2002), Chapter 2.3
- Billingsley, Probability and Measure (1995), Section 21
Current:
- Wainwright, High-Dimensional Statistics (2019), Chapters 2-3 (MGFs in concentration)
- Vershynin, High-Dimensional Probability (2018), Chapter 2 (sub-Gaussian MGF condition)
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.