Mathematical Infrastructure
Characteristic Functions
The Fourier transform of a probability distribution. Always exists (unlike MGFs), uniquely determines the distribution, multiplies under independent sums, and powers the rigorous proof of the central limit theorem via Levy's continuity theorem.
Why This Matters
The moment generating function is the standard tool for proving the central limit theorem in introductory courses. It has a fatal flaw: it does not always exist. For Cauchy, stable distributions, log-normal, and most fat-tailed distributions, for every . Any proof that uses MGFs cannot apply to these distributions.
The characteristic function replaces with the unit-modulus complex exponential . Since , the expectation always exists and is bounded by 1 in modulus. This is what makes characteristic functions the right tool for rigorous probability: every distribution has one, and they uniquely determine the distribution.
Three payoffs. First, the rigorous proof of the CLT goes through Levy's continuity theorem applied to characteristic functions, not MGFs. Second, characteristic functions are the bridge to harmonic analysis: Bochner's theorem identifies them with continuous positive-definite functions, which is what makes kernel methods work. Third, characteristic functions handle stable distributions and other fat-tailed laws where MGFs fail outright.
Definition
Characteristic Function
The characteristic function of a real-valued random variable is For a -valued random vector , for .
When has density , is exactly the Fourier transform of :
When is discrete with mass function , .
The factor of in the exponent is the entire reason characteristic functions exist where MGFs do not: for every real and every random outcome , so the expectation is always well-defined and bounded.
Basic Properties
The properties below follow directly from the definition.
- Boundedness. for all , with .
- Conjugate symmetry. .
- Uniform continuity. is uniformly continuous on . (Even when has no continuous density.)
- Linear transformations. .
- Independence and convolution. If are independent, then .
- Moment recovery. If , then is -times differentiable at 0 with .
The independence-to-multiplication property is what makes characteristic functions ideal for analyzing sums of independent random variables. The proof is direct: by independence.
Standard Examples
| Distribution | |
|---|---|
| Constant | |
| Bernoulli | |
| Binomial | |
| Poisson | |
| Uniform | |
| Exponential | |
| Normal | |
| Cauchy | |
| Standard -stable |
The standard normal is what the CLT pulls toward. The Cauchy and stable cases are exactly where MGFs do not exist; their characteristic functions are smooth and bounded.
Uniqueness
Characteristic functions completely determine the distribution.
Uniqueness Theorem for Characteristic Functions
Statement
If for all , then and have the same distribution. Equivalently, the map "distribution characteristic function" is injective on probability measures.
When has a density , the inversion formula gives whenever the integral on the right converges absolutely.
Intuition
The characteristic function is a Fourier transform, and the Fourier transform on is unitary (Plancherel). That unitarity is exactly the statement that no information about is lost in the transform. So if two distributions have the same Fourier transform, they are the same distribution. The inversion formula recovers explicitly when the characteristic function is integrable.
Proof Sketch
First show that for any bounded continuous on , is determined by via Fourier inversion of applied to a smoothed version. Use approximation by Gaussian smoothing: with standard normal. equals an integral of against the density of , whose density is computable from by the inversion formula. Send . Two distributions agreeing on all bounded continuous test functions are equal.
Why It Matters
Uniqueness is what licenses the strategy "to identify the distribution of a random variable, compute its characteristic function and match." This is how almost every named distribution is recognized in proofs: compute , recognize the formula. Without uniqueness, you would need to match all moments (which requires MGFs to exist) or directly work with distributions.
Failure Mode
Pointwise agreement of characteristic functions on a strict subset of does not in general identify the distribution. Two distributions can agree on on for any finite and still differ (this is loosely related to the moment problem). The hypothesis of agreement on all of is essential.
Levy's Continuity Theorem
This is the central fact that makes characteristic functions the right tool for weak convergence proofs, including the central limit theorem.
Levy's Continuity Theorem
Statement
Let be a sequence of probability measures on with characteristic functions , and let be a candidate limit with characteristic function . Then:
-
Continuity (forward direction). If (weak convergence, equivalently convergence in distribution for the corresponding random variables), then pointwise for every .
-
Continuity (converse direction). If pointwise for every , and is continuous at , then is the characteristic function of a probability measure , and .
The continuity-at-zero hypothesis is essential for the converse: without it, the limit might fail to integrate to 1 (mass escapes to infinity).
Intuition
Weak convergence is hard to verify directly because it requires checking for all bounded continuous . Pointwise convergence of characteristic functions is much easier to verify because it reduces to convergence of expectations of a single family of test functions . The continuity-at-zero condition rules out mass escaping to infinity (which would still give but for ).
Proof Sketch
Forward: Each is bounded continuous in , so weak convergence gives pointwise convergence of .
Converse: Continuity at 0 of the limit controls the tail of the distributions. Specifically, by an explicit estimate, , so if is continuous at 0, the right side is uniformly small for large , giving tightness. By Prokhorov, tightness plus pointwise convergence of characteristic functions implies weak convergence to a limit whose characteristic function is .
Why It Matters
This is the key lemma in the rigorous CLT proof. To show , compute the characteristic function of the standardized average and show it converges pointwise to (the standard normal characteristic function). Continuity at 0 is automatic because the limit is a smooth function. The CLT then follows from Levy's theorem. No moment generating function is required, so the result extends to any distribution with finite variance.
Failure Mode
The continuity-at-zero hypothesis cannot be dropped. Counterexample: let be uniform on . Then , which converges pointwise to . This is discontinuous at 0 and is not the characteristic function of any probability measure. The mass of "escapes to infinity," and Levy's converse correctly diagnoses the failure.
Bochner's Theorem and Kernels
Characteristic functions form a tightly characterized class of functions: they are exactly the continuous positive-definite functions normalized to .
Bochner's Theorem
Statement
A function is the characteristic function of some probability measure on if and only if it is:
- continuous on ,
- normalized: , and
- positive definite: for every , every , and every ,
Intuition
Positive definiteness is the abstract condition that the Gram matrix formed from any finite set of "shifts" is positive semidefinite. This is exactly what kernels in machine learning require. So positive-definite functions and probability characteristic functions are the same class.
Proof Sketch
Necessity: For with characteristic function , .
Sufficiency: Given a positive-definite normalized continuous , the Bochner construction defines a probability measure via the inverse Fourier transform of (interpreted in the distributional sense). The proof requires the Riesz representation theorem to extract a measure from the positive linear functional .
Why It Matters
Bochner's theorem is the bridge between probability and harmonic analysis. On the kernel-method side: a continuous translation-invariant kernel on is positive definite iff is the characteristic function of some probability measure (up to normalization). This is the foundation of random Fourier features: sample from the spectral measure of the kernel, and approximates the kernel by . Without Bochner there would be no random Fourier features.
Failure Mode
Positive definiteness is not the same as positivity of the function itself. is positive definite (it is the characteristic function of ) but takes negative values. Conversely, is non-negative but not positive definite. The condition is on the Gram-matrix structure, not on the sign of .
Connection to Moment Generating Functions
When the moment generating function exists in a neighborhood of zero, it and the characteristic function are related by analytic continuation:
So when MGFs exist, characteristic functions carry the same information. The advantage of characteristic functions is that they always exist, even when MGFs do not. The advantage of MGFs (when they exist) is that they are real-valued and easier to manipulate algebraically.
For the central limit theorem, the rigorous proof uses characteristic functions because finite-variance distributions need not have an MGF (e.g., Pareto with shape parameter 3 has finite variance but infinite MGF for any ). Characteristic-function-based proofs cover the entire finite-variance class.
Common Confusions
Characteristic functions are complex-valued, not real-valued
. The imaginary part is non-zero in general. Symmetric distributions (those with ) have real characteristic functions because by symmetry. For asymmetric distributions (e.g., exponential), the characteristic function is genuinely complex. This is unlike MGFs, which are always real.
Pointwise convergence of characteristic functions is not enough on its own
Levy's converse requires the limit function to be continuous at zero. Without that, you can have pointwise convergence of to a function that is not the characteristic function of anything (the example). This is the formal statement of "mass escaping to infinity." Always check continuity at zero of the candidate limit.
Smoothness of phi at zero is moments, not regularity of the density
exists iff . So smoothness of the characteristic function at zero corresponds to existence of moments, not to smoothness of the density. The Cauchy distribution has a smooth density but its characteristic function is not differentiable at 0, reflecting the non-existence of the mean.
Summary
- The characteristic function is the Fourier transform of the law of .
- It always exists, is bounded by 1 in modulus, and uniquely determines the distribution.
- Independent sums multiply: .
- Levy's continuity theorem identifies pointwise convergence of characteristic functions (with continuity at zero) with weak convergence of distributions; this is the engine of the rigorous CLT proof.
- Bochner's theorem characterizes characteristic functions as the continuous positive-definite normalized functions, which is the same class as translation-invariant positive-definite kernels.
- Characteristic functions handle stable and other fat-tailed distributions where MGFs are infinite.
Exercises
Problem
Compute the characteristic function of directly from the definition. Then use it together with the property to derive the characteristic function of .
Problem
Prove the central limit theorem for i.i.d. random variables with and (no MGF assumed) by computing the characteristic function of and applying Levy's continuity theorem.
References
Standard graduate texts:
- Billingsley, "Probability and Measure" (3rd edition, Wiley, 1995), Sections 26-29
- Durrett, "Probability: Theory and Examples" (5th edition, Cambridge, 2019), Sections 3.3-3.4
- Williams, "Probability with Martingales" (Cambridge, 1991), Chapter 16
- Resnick, "A Probability Path" (Birkhauser, 1999), Chapter 9
Harmonic analysis perspective:
- Folland, "A Course in Abstract Harmonic Analysis" (2nd edition, CRC Press, 2016), Chapter 4 (Bochner)
- Stein and Shakarchi, "Fourier Analysis: An Introduction" (Princeton, 2003), Chapter 5
Original sources:
- Levy, "Calcul des probabilites" (Gauthier-Villars, 1925) - characteristic functions and continuity theorem
- Bochner, "Vorlesungen uber Fouriersche Integrale" (Akademische Verlagsgesellschaft, 1932) - positive-definite functions
Random Fourier features (Bochner in ML):
- Rahimi and Recht, "Random Features for Large-Scale Kernel Machines" (NeurIPS 2007)
Next Topics
- Central limit theorem: the canonical application via Levy's continuity theorem
- Kernels and RKHS: Bochner's theorem powers random Fourier features
- Fat tails: stable distributions whose characteristic functions have no MGF analogue
Last reviewed: April 18, 2026
Prerequisites
Foundations this topic depends on.