Skip to main content

Mathematical Infrastructure

Characteristic Functions

The Fourier transform of a probability distribution. Always exists (unlike MGFs), uniquely determines the distribution, multiplies under independent sums, and powers the rigorous proof of the central limit theorem via Levy's continuity theorem.

CoreTier 1Stable~40 min
0

Why This Matters

The moment generating function MX(t)=E[etX]M_X(t) = \mathbb{E}[e^{tX}] is the standard tool for proving the central limit theorem in introductory courses. It has a fatal flaw: it does not always exist. For Cauchy, stable distributions, log-normal, and most fat-tailed distributions, E[etX]=\mathbb{E}[e^{tX}] = \infty for every t0t \neq 0. Any proof that uses MGFs cannot apply to these distributions.

The characteristic function φX(t)=E[eitX]\varphi_X(t) = \mathbb{E}[e^{itX}] replaces etXe^{tX} with the unit-modulus complex exponential eitXe^{itX}. Since eitX=1|e^{itX}| = 1, the expectation always exists and is bounded by 1 in modulus. This is what makes characteristic functions the right tool for rigorous probability: every distribution has one, and they uniquely determine the distribution.

Three payoffs. First, the rigorous proof of the CLT goes through Levy's continuity theorem applied to characteristic functions, not MGFs. Second, characteristic functions are the bridge to harmonic analysis: Bochner's theorem identifies them with continuous positive-definite functions, which is what makes kernel methods work. Third, characteristic functions handle stable distributions and other fat-tailed laws where MGFs fail outright.

Definition

Definition

Characteristic Function

The characteristic function of a real-valued random variable XX is φX(t)=E ⁣[eitX]=E[cos(tX)]+iE[sin(tX)],tR.\varphi_X(t) = \mathbb{E}\!\left[e^{itX}\right] = \mathbb{E}[\cos(tX)] + i \, \mathbb{E}[\sin(tX)], \qquad t \in \mathbb{R}. For a Rd\mathbb{R}^d-valued random vector XX, φX(t)=E[eit,X]\varphi_X(t) = \mathbb{E}[e^{i \langle t, X\rangle}] for tRdt \in \mathbb{R}^d.

When XX has density fXf_X, φX\varphi_X is exactly the Fourier transform of fXf_X: φX(t)=ReitxfX(x)dx.\varphi_X(t) = \int_{\mathbb{R}} e^{itx} f_X(x) \, dx.

When XX is discrete with mass function pXp_X, φX(t)=xeitxpX(x)\varphi_X(t) = \sum_x e^{itx} p_X(x).

The factor of ii in the exponent is the entire reason characteristic functions exist where MGFs do not: eitX=1|e^{itX}| = 1 for every real tt and every random outcome XX, so the expectation is always well-defined and bounded.

Basic Properties

The properties below follow directly from the definition.

  • Boundedness. φX(t)1|\varphi_X(t)| \leq 1 for all tt, with φX(0)=1\varphi_X(0) = 1.
  • Conjugate symmetry. φX(t)=φX(t)\varphi_X(-t) = \overline{\varphi_X(t)}.
  • Uniform continuity. φX\varphi_X is uniformly continuous on R\mathbb{R}. (Even when XX has no continuous density.)
  • Linear transformations. φaX+b(t)=eitbφX(at)\varphi_{aX + b}(t) = e^{itb} \varphi_X(at).
  • Independence and convolution. If X,YX, Y are independent, then φX+Y(t)=φX(t)φY(t)\varphi_{X + Y}(t) = \varphi_X(t) \varphi_Y(t).
  • Moment recovery. If EXk<\mathbb{E}|X|^k < \infty, then φX\varphi_X is kk-times differentiable at 0 with E[Xk]=ikφX(k)(0)\mathbb{E}[X^k] = i^{-k} \varphi_X^{(k)}(0).

The independence-to-multiplication property is what makes characteristic functions ideal for analyzing sums of independent random variables. The proof is direct: E[eit(X+Y)]=E[eitXeitY]=E[eitX]E[eitY]\mathbb{E}[e^{it(X+Y)}] = \mathbb{E}[e^{itX} e^{itY}] = \mathbb{E}[e^{itX}] \mathbb{E}[e^{itY}] by independence.

Standard Examples

DistributionφX(t)\varphi_X(t)
Constant cceitce^{itc}
Bernoulli(p)(p)1p+peit1 - p + p e^{it}
Binomial(n,p)(n, p)(1p+peit)n(1 - p + p e^{it})^n
Poisson(λ)(\lambda)eλ(eit1)e^{\lambda(e^{it} - 1)}
Uniform(a,b)(a, b)eitbeitait(ba)\frac{e^{itb} - e^{ita}}{it(b - a)}
Exponential(λ)(\lambda)λ/(λit)\lambda / (\lambda - it)
Normal N(μ,σ2)\mathcal{N}(\mu, \sigma^2)eiμtσ2t2/2e^{i\mu t - \sigma^2 t^2 / 2}
Cauchy(0,1)(0, 1)ete^{-\vert t\vert}
Standard α\alpha-stableetαe^{-\vert t\vert^\alpha}

The standard normal φ(t)=et2/2\varphi(t) = e^{-t^2/2} is what the CLT pulls toward. The Cauchy and stable cases are exactly where MGFs do not exist; their characteristic functions are smooth and bounded.

Uniqueness

Characteristic functions completely determine the distribution.

Theorem

Uniqueness Theorem for Characteristic Functions

Statement

If φX(t)=φY(t)\varphi_X(t) = \varphi_Y(t) for all tRt \in \mathbb{R}, then XX and YY have the same distribution. Equivalently, the map "distribution \mapsto characteristic function" is injective on probability measures.

When XX has a density fXf_X, the inversion formula gives fX(x)=12πReitxφX(t)dtf_X(x) = \frac{1}{2\pi} \int_{\mathbb{R}} e^{-itx} \varphi_X(t) \, dt whenever the integral on the right converges absolutely.

Intuition

The characteristic function is a Fourier transform, and the Fourier transform on L2L^2 is unitary (Plancherel). That unitarity is exactly the statement that no information about fXf_X is lost in the transform. So if two distributions have the same Fourier transform, they are the same distribution. The inversion formula recovers fXf_X explicitly when the characteristic function is integrable.

Proof Sketch

First show that for any bounded continuous gg on R\mathbb{R}, E[g(X)]\mathbb{E}[g(X)] is determined by φX\varphi_X via Fourier inversion of gg applied to a smoothed version. Use approximation by Gaussian smoothing: gσ(x)=g(x+σZ)ϕ(z)dzg_\sigma(x) = \int g(x + \sigma Z) \phi(z) dz with ZZ standard normal. E[gσ(X)]\mathbb{E}[g_\sigma(X)] equals an integral of gg against the density of X+σZX + \sigma Z, whose density is computable from φX\varphi_X by the inversion formula. Send σ0\sigma \to 0. Two distributions agreeing on all bounded continuous test functions are equal.

Why It Matters

Uniqueness is what licenses the strategy "to identify the distribution of a random variable, compute its characteristic function and match." This is how almost every named distribution is recognized in proofs: compute φ\varphi, recognize the formula. Without uniqueness, you would need to match all moments (which requires MGFs to exist) or directly work with distributions.

Failure Mode

Pointwise agreement of characteristic functions on a strict subset of R\mathbb{R} does not in general identify the distribution. Two distributions can agree on φ\varphi on [T,T][-T, T] for any finite TT and still differ (this is loosely related to the moment problem). The hypothesis of agreement on all of R\mathbb{R} is essential.

Levy's Continuity Theorem

This is the central fact that makes characteristic functions the right tool for weak convergence proofs, including the central limit theorem.

Theorem

Levy's Continuity Theorem

Statement

Let μn\mu_n be a sequence of probability measures on R\mathbb{R} with characteristic functions φn\varphi_n, and let μ\mu be a candidate limit with characteristic function φ\varphi. Then:

  1. Continuity (forward direction). If μnμ\mu_n \Rightarrow \mu (weak convergence, equivalently convergence in distribution for the corresponding random variables), then φn(t)φ(t)\varphi_n(t) \to \varphi(t) pointwise for every tRt \in \mathbb{R}.

  2. Continuity (converse direction). If φn(t)ψ(t)\varphi_n(t) \to \psi(t) pointwise for every tRt \in \mathbb{R}, and ψ\psi is continuous at t=0t = 0, then ψ\psi is the characteristic function of a probability measure μ\mu, and μnμ\mu_n \Rightarrow \mu.

The continuity-at-zero hypothesis is essential for the converse: without it, the limit ψ\psi might fail to integrate to 1 (mass escapes to infinity).

Intuition

Weak convergence is hard to verify directly because it requires checking gdμngdμ\int g \, d\mu_n \to \int g \, d\mu for all bounded continuous gg. Pointwise convergence of characteristic functions is much easier to verify because it reduces to convergence of expectations of a single family of test functions {eitx:tR}\{e^{itx} : t \in \mathbb{R}\}. The continuity-at-zero condition rules out mass escaping to infinity (which would still give φn(0)=1\varphi_n(0) = 1 but φn(t)0\varphi_n(t) \to 0 for t0t \neq 0).

Proof Sketch

Forward: Each eitxe^{itx} is bounded continuous in xx, so weak convergence gives pointwise convergence of φn\varphi_n.

Converse: Continuity at 0 of the limit ψ\psi controls the tail of the distributions. Specifically, by an explicit estimate, μn([K,K]c)1K1/K1/K(1Reφn(t))dt\mu_n([-K, K]^c) \leq \frac{1}{K} \int_{-1/K}^{1/K} (1 - \text{Re}\,\varphi_n(t)) \, dt, so if ψ\psi is continuous at 0, the right side is uniformly small for large KK, giving tightness. By Prokhorov, tightness plus pointwise convergence of characteristic functions implies weak convergence to a limit whose characteristic function is ψ\psi.

Why It Matters

This is the key lemma in the rigorous CLT proof. To show n(Xˉnμ)/σdN(0,1)\sqrt{n}(\bar X_n - \mu) / \sigma \xrightarrow{d} \mathcal{N}(0, 1), compute the characteristic function of the standardized average and show it converges pointwise to et2/2e^{-t^2/2} (the standard normal characteristic function). Continuity at 0 is automatic because the limit is a smooth function. The CLT then follows from Levy's theorem. No moment generating function is required, so the result extends to any distribution with finite variance.

Failure Mode

The continuity-at-zero hypothesis cannot be dropped. Counterexample: let XnX_n be uniform on [n,n][-n, n]. Then φn(t)=sin(nt)/(nt)\varphi_n(t) = \sin(nt)/(nt), which converges pointwise to ψ(t)=1{t=0}\psi(t) = \mathbf{1}\{t = 0\}. This ψ\psi is discontinuous at 0 and is not the characteristic function of any probability measure. The mass of XnX_n "escapes to infinity," and Levy's converse correctly diagnoses the failure.

Bochner's Theorem and Kernels

Characteristic functions form a tightly characterized class of functions: they are exactly the continuous positive-definite functions normalized to φ(0)=1\varphi(0) = 1.

Theorem

Bochner's Theorem

Statement

A function φ:RdC\varphi : \mathbb{R}^d \to \mathbb{C} is the characteristic function of some probability measure on Rd\mathbb{R}^d if and only if it is:

  1. continuous on Rd\mathbb{R}^d,
  2. normalized: φ(0)=1\varphi(0) = 1, and
  3. positive definite: for every nn, every t1,,tnRdt_1, \ldots, t_n \in \mathbb{R}^d, and every c1,,cnCc_1, \ldots, c_n \in \mathbb{C}, j,k=1ncjckφ(tjtk)0.\sum_{j, k = 1}^n c_j \overline{c_k} \, \varphi(t_j - t_k) \geq 0.

Intuition

Positive definiteness is the abstract condition that the Gram matrix [φ(tjtk)]j,k[\varphi(t_j - t_k)]_{j,k} formed from any finite set of "shifts" is positive semidefinite. This is exactly what kernels in machine learning require. So positive-definite functions and probability characteristic functions are the same class.

Proof Sketch

Necessity: For XX with characteristic function φ\varphi, j,kcjckφ(tjtk)=Ejcjeitj,X20\sum_{j, k} c_j \overline{c_k} \varphi(t_j - t_k) = \mathbb{E}|\sum_j c_j e^{i \langle t_j, X\rangle}|^2 \geq 0.

Sufficiency: Given a positive-definite normalized continuous φ\varphi, the Bochner construction defines a probability measure via the inverse Fourier transform of φ\varphi (interpreted in the distributional sense). The proof requires the Riesz representation theorem to extract a measure from the positive linear functional f(fφ~)(0)f \mapsto (f * \tilde\varphi)(0).

Why It Matters

Bochner's theorem is the bridge between probability and harmonic analysis. On the kernel-method side: a continuous translation-invariant kernel K(x,y)=k(xy)K(x, y) = k(x - y) on Rd\mathbb{R}^d is positive definite iff kk is the characteristic function of some probability measure (up to normalization). This is the foundation of random Fourier features: sample tμt \sim \mu from the spectral measure of the kernel, and z(x)=cos(t,x+b)z(x) = \cos(\langle t, x\rangle + b) approximates the kernel by E[z(x)z(y)]=k(xy)\mathbb{E}[z(x) z(y)] = k(x - y). Without Bochner there would be no random Fourier features.

Failure Mode

Positive definiteness is not the same as positivity of the function itself. cos(t)\cos(t) is positive definite (it is the characteristic function of 12δ1+12δ+1\frac{1}{2}\delta_{-1} + \frac{1}{2}\delta_{+1}) but takes negative values. Conversely, t|t| is non-negative but not positive definite. The condition is on the Gram-matrix structure, not on the sign of φ\varphi.

Connection to Moment Generating Functions

When the moment generating function MX(t)=E[etX]M_X(t) = \mathbb{E}[e^{tX}] exists in a neighborhood of zero, it and the characteristic function are related by analytic continuation: φX(t)=MX(it)when MX exists in a strip around the imaginary axis.\varphi_X(t) = M_X(it) \qquad \text{when } M_X \text{ exists in a strip around the imaginary axis.}

So when MGFs exist, characteristic functions carry the same information. The advantage of characteristic functions is that they always exist, even when MGFs do not. The advantage of MGFs (when they exist) is that they are real-valued and easier to manipulate algebraically.

For the central limit theorem, the rigorous proof uses characteristic functions because finite-variance distributions need not have an MGF (e.g., Pareto with shape parameter 3 has finite variance but infinite MGF for any t>0t > 0). Characteristic-function-based proofs cover the entire finite-variance class.

Common Confusions

Watch Out

Characteristic functions are complex-valued, not real-valued

φX(t)=E[cos(tX)]+iE[sin(tX)]\varphi_X(t) = \mathbb{E}[\cos(tX)] + i \mathbb{E}[\sin(tX)]. The imaginary part is non-zero in general. Symmetric distributions (those with X=dXX \stackrel{d}{=} -X) have real characteristic functions because E[sin(tX)]=0\mathbb{E}[\sin(tX)] = 0 by symmetry. For asymmetric distributions (e.g., exponential), the characteristic function is genuinely complex. This is unlike MGFs, which are always real.

Watch Out

Pointwise convergence of characteristic functions is not enough on its own

Levy's converse requires the limit function to be continuous at zero. Without that, you can have pointwise convergence of φn\varphi_n to a function that is not the characteristic function of anything (the sin(nt)/(nt)\sin(nt)/(nt) example). This is the formal statement of "mass escaping to infinity." Always check continuity at zero of the candidate limit.

Watch Out

Smoothness of phi at zero is moments, not regularity of the density

φX(k)(0)\varphi_X^{(k)}(0) exists iff EXk<\mathbb{E}|X|^k < \infty. So smoothness of the characteristic function at zero corresponds to existence of moments, not to smoothness of the density. The Cauchy distribution has a smooth density but its characteristic function ete^{-|t|} is not differentiable at 0, reflecting the non-existence of the mean.

Summary

  • The characteristic function φX(t)=E[eitX]\varphi_X(t) = \mathbb{E}[e^{itX}] is the Fourier transform of the law of XX.
  • It always exists, is bounded by 1 in modulus, and uniquely determines the distribution.
  • Independent sums multiply: φX+Y=φXφY\varphi_{X+Y} = \varphi_X \varphi_Y.
  • Levy's continuity theorem identifies pointwise convergence of characteristic functions (with continuity at zero) with weak convergence of distributions; this is the engine of the rigorous CLT proof.
  • Bochner's theorem characterizes characteristic functions as the continuous positive-definite normalized functions, which is the same class as translation-invariant positive-definite kernels.
  • Characteristic functions handle stable and other fat-tailed distributions where MGFs are infinite.

Exercises

ExerciseCore

Problem

Compute the characteristic function of XN(0,1)X \sim \mathcal{N}(0, 1) directly from the definition. Then use it together with the property φaX+b(t)=eitbφX(at)\varphi_{aX + b}(t) = e^{itb} \varphi_X(at) to derive the characteristic function of N(μ,σ2)\mathcal{N}(\mu, \sigma^2).

ExerciseAdvanced

Problem

Prove the central limit theorem for i.i.d. random variables X1,X2,X_1, X_2, \ldots with E[Xi]=0\mathbb{E}[X_i] = 0 and E[Xi2]=σ2<\mathbb{E}[X_i^2] = \sigma^2 < \infty (no MGF assumed) by computing the characteristic function of Sn/(σn)S_n / (\sigma \sqrt{n}) and applying Levy's continuity theorem.

References

Standard graduate texts:

  • Billingsley, "Probability and Measure" (3rd edition, Wiley, 1995), Sections 26-29
  • Durrett, "Probability: Theory and Examples" (5th edition, Cambridge, 2019), Sections 3.3-3.4
  • Williams, "Probability with Martingales" (Cambridge, 1991), Chapter 16
  • Resnick, "A Probability Path" (Birkhauser, 1999), Chapter 9

Harmonic analysis perspective:

  • Folland, "A Course in Abstract Harmonic Analysis" (2nd edition, CRC Press, 2016), Chapter 4 (Bochner)
  • Stein and Shakarchi, "Fourier Analysis: An Introduction" (Princeton, 2003), Chapter 5

Original sources:

  • Levy, "Calcul des probabilites" (Gauthier-Villars, 1925) - characteristic functions and continuity theorem
  • Bochner, "Vorlesungen uber Fouriersche Integrale" (Akademische Verlagsgesellschaft, 1932) - positive-definite functions

Random Fourier features (Bochner in ML):

  • Rahimi and Recht, "Random Features for Large-Scale Kernel Machines" (NeurIPS 2007)

Next Topics

  • Central limit theorem: the canonical application via Levy's continuity theorem
  • Kernels and RKHS: Bochner's theorem powers random Fourier features
  • Fat tails: stable distributions whose characteristic functions etαe^{-|t|^\alpha} have no MGF analogue

Last reviewed: April 18, 2026

Prerequisites

Foundations this topic depends on.

Next Topics