Characteristic Functions

Sneiderman, Robby

Mathematical Infrastructure

Characteristic Functions

The Fourier transform of a probability distribution. Always exists (unlike MGFs), uniquely determines the distribution, multiplies under independent sums, and powers the rigorous proof of the central limit theorem via Levy's continuity theorem.

CoreTier 1StableCore spine~40 min

Prerequisites

Measure Theoretic Probability Moment Generating Functions

Start 8-question practice · 5 available 3-question pulse check Prereq Map

Learning position

Read this page in the graph.

mathematical-infrastructure | layer 1 | tier 1. This page has 2 direct prerequisites and 6 published dependents.

Open Atlas Prerequisites Leads to

What next

Central Limit Theorem

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Source-grounded page

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

theorem visual

Characteristic Function Pair

$A distribution and its characteristic function side by side: the transform always exists, even when the MGF fails.$

distribution

φ (t)

Preset: Gaussian

Smooth bell curve with a smooth characteristic function.

The moment generating function $M_X(t) = \mathbb{E}[e^{tX}]$ is the standard tool for proving the central limit theorem in introductory courses. It has a fatal flaw: it does not always exist on a neighborhood of zero. For Cauchy and symmetric stable distributions, $\mathbb{E}[e^{tX}] = \infty$ for every $t \neq 0$ . For the log-normal and other one-sided heavy-tailed laws, the MGF is finite for $t \leq 0$ but $\mathbb{E}[e^{tX}] = \infty$ for every $t > 0$ , so it still fails to be analytic at the origin. Any proof that uses MGFs cannot apply to these distributions.

The characteristic function $\varphi_X(t) = \mathbb{E}[e^{itX}]$ replaces $e^{tX}$ with the unit-modulus complex exponential $e^{itX}$ . Since $|e^{itX}| = 1$ , the expectation always exists and is bounded by 1 in modulus. This is what makes characteristic functions the right tool for rigorous probability: every distribution has one, and they uniquely determine the distribution.

Three payoffs. First, the rigorous proof of the CLT goes through Levy's continuity theorem applied to characteristic functions, not MGFs. Second, characteristic functions are the bridge to harmonic analysis: Bochner's theorem identifies them with continuous positive-definite functions, which is what makes kernel methods work. Third, characteristic functions handle stable distributions and other fat-tailed laws where MGFs fail outright.

This page is best read side by side with Moment Generating Functions. The MGF is better when it exists and you want moments or Chernoff bounds. The characteristic function is better when you need a tool that always exists and can still prove weak convergence and the CLT.

Quick Version

MGF $M_X(t)=\mathbb{E}[e^{tX}]$ . Strong when it exists, but it can fail for heavy-tailed laws. Use it for moments, Chernoff bounds, and exponential tail control.
Characteristic function $\varphi_X(t)=\mathbb{E}[e^{itX}]$ . Exists for every distribution. Use it for uniqueness, convolution, and weak convergence.
Log-MGF. Defined only when the MGF exists. This is the object used in rate functions and large deviations.
Levy continuity theorem. The rigorous bridge from pointwise convergence of characteristic functions to convergence in distribution.

The one-line mental model: MGFs are stronger but fragile, characteristic functions are weaker pointwise but universal.

Definition

Characteristic Function $φ_{X} (t)$

The characteristic function of a real-valued random variable $X$ is $\varphi_X(t) = \mathbb{E}\!\left[e^{itX}\right] = \mathbb{E}[\cos(tX)] + i \, \mathbb{E}[\sin(tX)], \qquad t \in \mathbb{R}.$ For a $\mathbb{R}^d$ -valued random vector $X$ , $\varphi_X(t) = \mathbb{E}[e^{i \langle t, X\rangle}]$ for $t \in \mathbb{R}^d$ .

When $X$ has density $f_X$ , $\varphi_X$ is exactly the Fourier transform of $f_X$ : $\varphi_X(t) = \int_{\mathbb{R}} e^{itx} f_X(x) \, dx.$

When $X$ is discrete with mass function $p_X$ , $\varphi_X(t) = \sum_x e^{itx} p_X(x)$ .

The factor of $i$ in the exponent is the entire reason characteristic functions exist where MGFs do not: $|e^{itX}| = 1$ for every real $t$ and every random outcome $X$ , so the expectation is always well-defined and bounded.

Basic Properties

The properties below follow directly from the definition.

Boundedness. $|\varphi_X(t)| \leq 1$ for all $t$ , with $\varphi_X(0) = 1$ .
Conjugate symmetry. $\varphi_X(-t) = \overline{\varphi_X(t)}$ .
Uniform continuity. $\varphi_X$ is uniformly continuous on $\mathbb{R}$ . (Even when $X$ has no continuous density.)
Linear transformations. $\varphi_{aX + b}(t) = e^{itb} \varphi_X(at)$ .
Independence and convolution. If $X, Y$ are independent, then $\varphi_{X + Y}(t) = \varphi_X(t) \varphi_Y(t)$ .
Moment recovery. If $\mathbb{E}|X|^k < \infty$ , then $\varphi_X$ is $k$ -times differentiable at 0 with $\mathbb{E}[X^k] = i^{-k} \varphi_X^{(k)}(0)$ .

The independence-to-multiplication property is what makes characteristic functions ideal for analyzing sums of independent random variables. The proof is direct: $\mathbb{E}[e^{it(X+Y)}] = \mathbb{E}[e^{itX} e^{itY}] = \mathbb{E}[e^{itX}] \mathbb{E}[e^{itY}]$ by independence.

Standard Examples

Constant $c$ . $\varphi_X(t) = e^{itc}$
Bernoulli $(p)$ . $\varphi_X(t) = 1 - p + p e^{it}$
Binomial $(n, p)$ . $\varphi_X(t) = (1 - p + p e^{it})^n$
Poisson $(\lambda)$ . $\varphi_X(t) = e^{\lambda(e^{it} - 1)}$
Uniform $(a, b)$ . $\varphi_X(t) = \frac{e^{itb} - e^{ita}}{it(b - a)}$
Exponential $(\lambda)$ . $\varphi_X(t) = \lambda / (\lambda - it)$
Normal $\mathcal{N}(\mu, \sigma^2)$ . $\varphi_X(t) = e^{i\mu t - \sigma^2 t^2 / 2}$
Cauchy $(0, 1)$ . $\varphi_X(t) = e^{-\vert t\vert}$
Standard $\alpha$ -stable. $\varphi_X(t) = e^{-\vert t\vert^\alpha}$

The standard normal $\varphi(t) = e^{-t^2/2}$ is what the CLT pulls toward. The Cauchy and $\alpha$ -stable cases are exactly where MGFs do not exist; their characteristic functions are bounded and continuous, but only smooth in proportion to the moments that exist. The Cauchy characteristic function $e^{-|t|}$ is continuous but not differentiable at $t=0$ , which is the analytic counterpart of having no finite mean. Smoothness of $\varphi_X$ at $0$ is governed by the moment-recovery property: $\varphi_X$ is $k$ -times differentiable at $0$ if and only if $\mathbb{E}|X|^k < \infty$ .

Uniqueness

Characteristic functions completely determine the distribution.

Theorem

Uniqueness Theorem for Characteristic Functions

Statement

If $\varphi_X(t) = \varphi_Y(t)$ for all $t \in \mathbb{R}$ , then $X$ and $Y$ have the same distribution. Equivalently, the map "distribution $\mapsto$ characteristic function" is injective on probability measures.

When $X$ has a density $f_X$ , the inversion formula gives $f_X(x) = \frac{1}{2\pi} \int_{\mathbb{R}} e^{-itx} \varphi_X(t) \, dt$ whenever the integral on the right converges absolutely.

Intuition

The characteristic function is a Fourier transform, and the Fourier transform on $L^2$ is unitary (Plancherel). That unitarity is exactly the statement that no information about $f_X$ is lost in the transform. So if two distributions have the same Fourier transform, they are the same distribution. The inversion formula recovers $f_X$ explicitly when the characteristic function is integrable.

Proof Sketch

First show that for any bounded continuous $g$ on $\mathbb{R}$ , $\mathbb{E}[g(X)]$ is determined by $\varphi_X$ via Fourier inversion of $g$ applied to a smoothed version. Use approximation by Gaussian smoothing: $g_\sigma(x) = \int g(x + \sigma Z) \phi(z) dz$ with $Z$ standard normal. $\mathbb{E}[g_\sigma(X)]$ equals an integral of $g$ against the density of $X + \sigma Z$ , whose density is computable from $\varphi_X$ by the inversion formula. Send $\sigma \to 0$ . Two distributions agreeing on all bounded continuous test functions are equal.

Why It Matters

Uniqueness is what licenses the strategy "to identify the distribution of a random variable, compute its characteristic function and match." This is how almost every named distribution is recognized in proofs: compute $\varphi$ , recognize the formula. Without uniqueness, you would need to match all moments (which requires MGFs to exist) or directly work with distributions.

Failure Mode

Pointwise agreement of characteristic functions on a strict subset of $\mathbb{R}$ does not in general identify the distribution. Two distributions can agree on $\varphi$ on $[-T, T]$ for any finite $T$ and still differ (this is loosely related to the moment problem). The hypothesis of agreement on all of $\mathbb{R}$ is essential.

report a correction →

Levy's Continuity Theorem

This is the central fact that makes characteristic functions the right tool for weak convergence proofs, including the central limit theorem.

Theorem

Levy's Continuity Theorem

Statement

Let $\mu_n$ be a sequence of probability measures on $\mathbb{R}$ with characteristic functions $\varphi_n$ , and let $\mu$ be a candidate limit with characteristic function $\varphi$ . Then:

Continuity (forward direction). If $\mu_n \Rightarrow \mu$ (weak convergence, equivalently convergence in distribution for the corresponding random variables), then $\varphi_n(t) \to \varphi(t)$ pointwise for every $t \in \mathbb{R}$ .
Continuity (converse direction). If $\varphi_n(t) \to \psi(t)$ pointwise for every $t \in \mathbb{R}$ , and $\psi$ is continuous at $t = 0$ , then $\psi$ is the characteristic function of a probability measure $\mu$ , and $\mu_n \Rightarrow \mu$ .

The continuity-at-zero hypothesis is essential for the converse: without it, the limit $\psi$ might fail to integrate to 1 (mass escapes to infinity).

Intuition

Weak convergence is hard to verify directly because it requires checking $\int g \, d\mu_n \to \int g \, d\mu$ for all bounded continuous $g$ . Pointwise convergence of characteristic functions is much easier to verify because it reduces to convergence of expectations of a single family of test functions $\{e^{itx} : t \in \mathbb{R}\}$ . The continuity-at-zero condition rules out mass escaping to infinity (which would still give $\varphi_n(0) = 1$ but $\varphi_n(t) \to 0$ for $t \neq 0$ ).

Proof Sketch

Forward: Each $e^{itx}$ is bounded continuous in $x$ , so weak convergence gives pointwise convergence of $\varphi_n$ .

Converse: Continuity at 0 of the limit $\psi$ controls the tail of the distributions. Specifically, by an explicit estimate, $\mu_n([-K, K]^c) \leq \frac{1}{K} \int_{-1/K}^{1/K} (1 - \text{Re}\,\varphi_n(t)) \, dt$ , so if $\psi$ is continuous at 0, the right side is uniformly small for large $K$ , giving tightness. By Prokhorov, tightness plus pointwise convergence of characteristic functions implies weak convergence to a limit whose characteristic function is $\psi$ .

Why It Matters

This is the key lemma in the rigorous CLT proof. To show $\sqrt{n}(\bar X_n - \mu) / \sigma \xrightarrow{d} \mathcal{N}(0, 1)$ , compute the characteristic function of the standardized average and show it converges pointwise to $e^{-t^2/2}$ (the standard normal characteristic function). Continuity at 0 is automatic because the limit is a smooth function. The CLT then follows from Levy's theorem. No moment generating function is required, so the result extends to any distribution with finite variance.

Failure Mode

The continuity-at-zero hypothesis cannot be dropped. Counterexample: let $X_n$ be uniform on $[-n, n]$ . Then $\varphi_n(t) = \sin(nt)/(nt)$ , which converges pointwise to $\psi(t) = \mathbf{1}\{t = 0\}$ . This $\psi$ is discontinuous at 0 and is not the characteristic function of any probability measure. The mass of $X_n$ "escapes to infinity," and Levy's converse correctly diagnoses the failure.

report a correction →

Bochner's Theorem and Kernels

Characteristic functions form a tightly characterized class of functions: they are exactly the continuous positive-definite functions normalized to $\varphi(0) = 1$ .

Theorem

Bochner's Theorem

Statement

A function $\varphi : \mathbb{R}^d \to \mathbb{C}$ is the characteristic function of some probability measure on $\mathbb{R}^d$ if and only if it is:

continuous on $\mathbb{R}^d$ ,
normalized: $\varphi(0) = 1$ , and
positive definite: for every $n$ , every $t_1, \ldots, t_n \in \mathbb{R}^d$ , and every $c_1, \ldots, c_n \in \mathbb{C}$ , $\sum_{j, k = 1}^n c_j \overline{c_k} \, \varphi(t_j - t_k) \geq 0.$

Intuition

Positive definiteness is the abstract condition that the Gram matrix $[\varphi(t_j - t_k)]_{j,k}$ formed from any finite set of "shifts" is positive semidefinite. This is exactly what kernels in machine learning require. So positive-definite functions and probability characteristic functions are the same class.

Proof Sketch

Necessity: For $X$ with characteristic function $\varphi$ , $\sum_{j, k} c_j \overline{c_k} \varphi(t_j - t_k) = \mathbb{E}|\sum_j c_j e^{i \langle t_j, X\rangle}|^2 \geq 0$ .

Sufficiency: Given a positive-definite normalized continuous $\varphi$ , the Bochner construction defines a probability measure via the inverse Fourier transform of $\varphi$ (interpreted in the distributional sense). The proof requires the Riesz representation theorem to extract a measure from the positive linear functional $f \mapsto (f * \tilde\varphi)(0)$ .

Why It Matters

Bochner's theorem is the bridge between probability and harmonic analysis. On the kernel-method side: a continuous translation-invariant kernel $K(x, y) = k(x - y)$ on $\mathbb{R}^d$ is positive definite iff $k$ is the characteristic function of some probability measure (up to normalization). This is the foundation of random Fourier features: sample $t \sim \mu$ from the spectral measure of the kernel, and $z(x) = \cos(\langle t, x\rangle + b)$ approximates the kernel by $\mathbb{E}[z(x) z(y)] = k(x - y)$ . Without Bochner there would be no random Fourier features.

Failure Mode

Positive definiteness is not the same as positivity of the function itself. $\cos(t)$ is positive definite (it is the characteristic function of $\frac{1}{2}\delta_{-1} + \frac{1}{2}\delta_{+1}$ ) but takes negative values. Conversely, $|t|$ is non-negative but not positive definite. The condition is on the Gram-matrix structure, not on the sign of $\varphi$ .

report a correction →

Connection to Moment Generating Functions

When the moment generating function $M_X(t) = \mathbb{E}[e^{tX}]$ exists in a neighborhood of zero, it and the characteristic function are related by analytic continuation: $\varphi_X(t) = M_X(it) \qquad \text{when } M_X \text{ exists in a strip around the imaginary axis.}$

So when MGFs exist, characteristic functions carry the same information. The advantage of characteristic functions is that they always exist, even when MGFs do not. The advantage of MGFs (when they exist) is that they are real-valued and easier to manipulate algebraically.

For the central limit theorem, the rigorous proof uses characteristic functions because finite-variance distributions need not have an MGF (e.g., Pareto with shape parameter 3 has finite variance but infinite MGF for any $t > 0$ ). Characteristic-function-based proofs cover the entire finite-variance class.

Common Confusions

Watch Out

Characteristic functions are complex-valued, not real-valued

$\varphi_X(t) = \mathbb{E}[\cos(tX)] + i \mathbb{E}[\sin(tX)]$ . The imaginary part is non-zero in general. Symmetric distributions (those with $X \stackrel{d}{=} -X$ ) have real characteristic functions because $\mathbb{E}[\sin(tX)] = 0$ by symmetry. For asymmetric distributions (e.g., exponential), the characteristic function is genuinely complex. This is unlike MGFs, which are always real.

Watch Out

Pointwise convergence of characteristic functions is not enough on its own

Levy's converse requires the limit function to be continuous at zero. Without that, you can have pointwise convergence of $\varphi_n$ to a function that is not the characteristic function of anything (the $\sin(nt)/(nt)$ example). This is the formal statement of "mass escaping to infinity." Always check continuity at zero of the candidate limit.

Watch Out

Smoothness of phi at zero is moments, not regularity of the density

$\varphi_X^{(k)}(0)$ exists iff $\mathbb{E}|X|^k < \infty$ . So smoothness of the characteristic function at zero corresponds to existence of moments, not to smoothness of the density. The Cauchy distribution has a smooth density but its characteristic function $e^{-|t|}$ is not differentiable at 0, reflecting the non-existence of the mean.

Summary

The characteristic function $\varphi_X(t) = \mathbb{E}[e^{itX}]$ is the Fourier transform of the law of $X$ .
It always exists, is bounded by 1 in modulus, and uniquely determines the distribution.
Independent sums multiply: $\varphi_{X+Y} = \varphi_X \varphi_Y$ .
Levy's continuity theorem identifies pointwise convergence of characteristic functions (with continuity at zero) with weak convergence of distributions; this is the engine of the rigorous CLT proof.
Bochner's theorem characterizes characteristic functions as the continuous positive-definite normalized functions, which is the same class as translation-invariant positive-definite kernels.
Characteristic functions handle stable and other fat-tailed distributions where MGFs are infinite.

Exercises

ExerciseCore

Problem

Compute the characteristic function of $X \sim \mathcal{N}(0, 1)$ directly from the definition. Then use it together with the property $\varphi_{aX + b}(t) = e^{itb} \varphi_X(at)$ to derive the characteristic function of $\mathcal{N}(\mu, \sigma^2)$ .

ExerciseAdvanced

Problem

Prove the central limit theorem for i.i.d. random variables $X_1, X_2, \ldots$ with $\mathbb{E}[X_i] = 0$ and $\mathbb{E}[X_i^2] = \sigma^2 < \infty$ (no MGF assumed) by computing the characteristic function of $S_n / (\sigma \sqrt{n})$ and applying Levy's continuity theorem.

References

Standard graduate texts:

Billingsley, "Probability and Measure" (3rd edition, Wiley, 1995), Sections 26-29
Durrett, "Probability: Theory and Examples" (5th edition, Cambridge, 2019), Sections 3.3-3.4
Williams, "Probability with Martingales" (Cambridge, 1991), Chapter 16
Resnick, "A Probability Path" (Birkhauser, 1999), Chapter 9

Harmonic analysis perspective:

Folland, "A Course in Abstract Harmonic Analysis" (2nd edition, CRC Press, 2016), Chapter 4 (Bochner)
Stein and Shakarchi, "Fourier Analysis: An Introduction" (Princeton, 2003), Chapter 5

Original sources:

Levy, "Calcul des probabilites" (Gauthier-Villars, 1925) - characteristic functions and continuity theorem
Bochner, "Vorlesungen uber Fouriersche Integrale" (Akademische Verlagsgesellschaft, 1932) - positive-definite functions

Random Fourier features (Bochner in ML):

Rahimi and Recht, "Random Features for Large-Scale Kernel Machines" (NeurIPS 2007)

Next Topics

Central limit theorem: the canonical application via Levy's continuity theorem
Kernels and RKHS: Bochner's theorem powers random Fourier features
Fat tails: stable distributions whose characteristic functions $e^{-|t|^\alpha}$ have no MGF analogue

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

2

Measure-Theoretic Probabilitylayer 0B · tier 1
Moment Generating Functionslayer 0A · tier 2

Derived topics

6

Central Limit Theoremlayer 0B · tier 1
The Multivariate Normal Distributionlayer 0B · tier 1
Fat Tails and Heavy-Tailed Distributionslayer 2 · tier 1
LLN and CLT Failures Under Heavy Tailslayer 2 · tier 1
Poisson Limit Theorem and Le Cam's Boundlayer 1 · tier 2

+1 more on the derived-topics page.

Graph-backed continuations

Central Limit Theorem Kernels and Reproducing Kernel Hilbert Spaces Fat Tails and Heavy-Tailed Distributions The Multivariate Normal Distribution