Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Comparison

Sub-Gaussian vs. Sub-Exponential Random Variables

Two tail regimes for concentration: sub-Gaussian gives exp(-ct^2), sub-exponential gives exp(-ct) for large deviations, and the boundary between them explains when classical bounds break down.

What Each Measures

Both sub-Gaussian and sub-exponential are tail-decay classes: they describe how fast Pr[Xt]\Pr[|X| \geq t] decreases as tt \to \infty. A random variable is classified based on how its tails compare to specific reference distributions.

Sub-Gaussian: tails decay at least as fast as a Gaussian. Prototypical example: any bounded random variable.

Sub-Exponential: tails decay at least as fast as an exponential. Strictly heavier than sub-Gaussian, but still lighter than polynomial tails.

Side-by-Side Definitions

Definition

Sub-Gaussian Random Variable

A centered random variable XX is sub-Gaussian with parameter σ\sigma if any of the following equivalent conditions holds:

  1. Tail condition: Pr[Xt]2exp(t2/(2σ2))\Pr[|X| \geq t] \leq 2\exp(-t^2/(2\sigma^2)) for all t0t \geq 0
  2. MGF condition: E[eλX]exp(σ2λ2/2)\mathbb{E}[e^{\lambda X}] \leq \exp(\sigma^2 \lambda^2 / 2) for all λR\lambda \in \mathbb{R}
  3. Moment condition: (E[Xp])1/pCσp(\mathbb{E}[|X|^p])^{1/p} \leq C\sigma\sqrt{p} for all p1p \geq 1

The sub-Gaussian norm is Xψ2=inf{t>0:E[eX2/t2]2}\|X\|_{\psi_2} = \inf\{t > 0 : \mathbb{E}[e^{X^2/t^2}] \leq 2\}.

Definition

Sub-Exponential Random Variable

A centered random variable XX is sub-exponential with parameters (ν,b)(\nu, b) if any of the following equivalent conditions holds:

  1. Tail condition: Pr[Xt]2exp(t/b)\Pr[|X| \geq t] \leq 2\exp(-t/b) for all tt sufficiently large
  2. MGF condition: E[eλX]exp(ν2λ2/2)\mathbb{E}[e^{\lambda X}] \leq \exp(\nu^2 \lambda^2 / 2) for λ1/b|\lambda| \leq 1/b
  3. Moment condition: (E[Xp])1/pCbp(\mathbb{E}[|X|^p])^{1/p} \leq Cbp for all p1p \geq 1

The sub-exponential norm is Xψ1=inf{t>0:E[eX/t]2}\|X\|_{\psi_1} = \inf\{t > 0 : \mathbb{E}[e^{|X|/t}] \leq 2\}.

Tail Behavior: The Core Difference

The fundamental distinction is in the tail decay rate:

PropertySub-GaussianSub-Exponential
Tail decayexp(ct2)\exp(-ct^2) for all ttexp(ct)\exp(-ct) for large tt
MGF finiteFor all λR\lambda \in \mathbb{R}Only for $
MomentsXpσp\|X\|_p \sim \sigma\sqrt{p}Xpbp\|X\|_p \sim bp
Orlicz normψ2\psi_2: finite E[exp(X2/σ2)]\mathbb{E}[\exp(X^2/\sigma^2)]ψ1\psi_1: finite E[exp(X/b)]\mathbb{E}[\exp(\lvert X\rvert/b)]

The difference is sharpest in the tails. For small tt, both classes behave similarly (Gaussian-like). For large tt, sub-Gaussian tails decay quadratically in the exponent (ect2e^{-ct^2}), while sub-exponential tails decay only linearly (ecte^{-ct}). This means sub-exponential variables have occasional large values that are much more likely than a Gaussian would predict.

Concentration for Sums

The tail classes directly determine how sums concentrate:

Proposition

Hoeffding-type Bound for Sub-Gaussian Sums

Statement

If X1,,XnX_1, \ldots, X_n are independent, centered, sub-Gaussian with parameters σi\sigma_i, then:

Pr ⁣[i=1nXit]2exp ⁣(t22iσi2)for all t0\Pr\!\left[\left|\sum_{i=1}^n X_i\right| \geq t\right] \leq 2\exp\!\left(-\frac{t^2}{2\sum_i \sigma_i^2}\right) \quad \text{for all } t \geq 0

The sum is also sub-Gaussian. Gaussian-quality concentration everywhere.

Proposition

Bernstein-type Bound for Sub-Exponential Sums

Statement

If X1,,XnX_1, \ldots, X_n are independent, centered, sub-exponential with parameters (νi,b)(\nu_i, b), then:

Pr ⁣[i=1nXit]2exp ⁣(cmin ⁣(t2iνi2,  tb))\Pr\!\left[\left|\sum_{i=1}^n X_i\right| \geq t\right] \leq 2\exp\!\left(-c\min\!\left(\frac{t^2}{\sum_i \nu_i^2},\; \frac{t}{b}\right)\right)

Two regimes: sub-Gaussian (ect2e^{-ct^2}) for small tt, sub-exponential (ecte^{-ct}) for large tt. The transition occurs at tiνi2/bt \approx \sum_i \nu_i^2 / b.

This is exactly the Bernstein phenomenon: sums of sub-exponential variables have Gaussian concentration near the mean and exponential concentration in the tails.

The Fundamental Relationship: Products and Squares

Proposition

Products of Sub-Gaussians Are Sub-Exponential

Statement

If XX and YY are sub-Gaussian, then XYXY is sub-exponential. Specifically:

XYψ1Xψ2Yψ2\|XY\|_{\psi_1} \leq \|X\|_{\psi_2} \cdot \|Y\|_{\psi_2}

In particular, if XX is sub-Gaussian, then X2X^2 is sub-exponential.

Intuition

A sub-Gaussian variable XX has E[eX2/σ2]<\mathbb{E}[e^{X^2/\sigma^2}] < \infty. When you square it, E[eX2/c]=E[eX2/c]\mathbb{E}[e^{|X^2|/c}] = \mathbb{E}[e^{X^2/c}], which is finite only for sufficiently large cc. This is the sub-exponential condition, not the sub-Gaussian one. Squaring "promotes" the tail from ψ2\psi_2 to ψ1\psi_1: the quadratic exponent becomes linear.

Canonical Examples

Example

Chi-squared: sub-exponential but not sub-Gaussian

Let Z1,,ZdZ_1, \ldots, Z_d be i.i.d. N(0,1)\mathcal{N}(0, 1). Each ZiZ_i is sub-Gaussian. The chi-squared statistic χd2=i=1dZi2\chi^2_d = \sum_{i=1}^d Z_i^2 is a sum of Zi2Z_i^2, each of which is sub-exponential (as the square of a sub-Gaussian). So χd2\chi^2_d is sub-exponential.

But χd2\chi^2_d is not sub-Gaussian. Its tail satisfies Pr[χd2dt]ect\Pr[\chi^2_d - d \geq t] \approx e^{-ct} for large tt, not ect2e^{-ct^2}. The MGF E[eλχd2]=(12λ)d/2\mathbb{E}[e^{\lambda \chi^2_d}] = (1 - 2\lambda)^{-d/2} is finite only for λ<1/2\lambda < 1/2, not for all λ\lambda.

This is the prototypical example: chi-squared statistics arise everywhere in statistics and ML (e.g., quadratic forms, variance estimates, kernel evaluations), and they require sub-exponential theory.

Example

Bounded variables: sub-Gaussian

If X[a,b]X \in [a, b] almost surely, then XX is sub-Gaussian with parameter σ=(ba)/2\sigma = (b - a)/2. This is Hoeffding's lemma. Bounded variables are the best-behaved class: they are sub-Gaussian, and Hoeffding's inequality is a special case of sub-Gaussian concentration.

Example

Exponential distribution: sub-exponential but not sub-Gaussian

If XExp(λ)X \sim \text{Exp}(\lambda), then XX is sub-exponential with parameter b=1/λb = 1/\lambda. Its tail is Pr[Xt]=eλt\Pr[X \geq t] = e^{-\lambda t}, which decays linearly in tt, not quadratically. The MGF E[esX]=λ/(λs)\mathbb{E}[e^{sX}] = \lambda/(\lambda - s) exists only for s<λs < \lambda.

The Bernstein Condition

The Bernstein condition provides a clean characterization of when a variable is sub-exponential via its moments:

Definition

Bernstein Condition

A centered random variable XX satisfies the Bernstein condition with parameter bb if for all integers k2k \geq 2:

E[Xk]k!2bk2E[X2]\mathbb{E}[|X|^k] \leq \frac{k!}{2} \cdot b^{k-2} \cdot \mathbb{E}[X^2]

This is equivalent to being sub-exponential. The condition says that higher moments grow at most factorially (like those of an exponential distribution), not faster.

The contrast: sub-Gaussian variables have moments growing like E[Xk](Cσ)kkk/2\mathbb{E}[|X|^k] \leq (C\sigma)^k \cdot k^{k/2}, which is slower than factorial. Sub-exponential variables have factorial moment growth, which is faster but still controlled.

Where Each Fails

Sub-Gaussian fails for heavy-tailed data

Many real-world distributions have heavier tails than sub-Gaussian: log-normal returns in finance, power-law degree distributions in networks, and noise in robust statistics. For these, even sub-exponential can be too restrictive, and you may need polynomial tail bounds (finite moments only) or heavy-tailed concentration tools like the median-of-means estimator.

Sub-Exponential fails for the heaviest tails

If E[eX/b]=\mathbb{E}[e^{|X|/b}] = \infty for all bb, the variable is not sub-exponential. Pareto distributions, Cauchy distributions, and other power-law-tailed variables fall outside both classes. For these you need truncation or robust estimation techniques.

What to Memorize

Sub-GaussianSub-Exponential
Tailect2e^{-ct^2}ecte^{-ct} (large tt)
MGF domainAll λ\lambda$
Normψ2\psi_2ψ1\psi_1
Moment growthσp\sim \sigma\sqrt{p}bp\sim bp
InclusionSub-Gaussian \subset Sub-Exponential(strictly larger)
Closed underAddition, linear combinationsAddition, linear combinations
ProductsXYX \cdot Y is sub-exponentialNo clean closure

Key facts to internalize:

  1. Every sub-Gaussian variable is sub-exponential (but not vice versa)
  2. XX sub-Gaussian \Rightarrow X2X^2 sub-exponential
  3. χ2\chi^2 is the canonical sub-exponential-but-not-sub-Gaussian example
  4. Bernstein's inequality is the concentration result for sub-exponential sums

When a Researcher Would Use Each

Example

Bounding the sample mean of bounded losses

The loss (h(x),y)[0,1]\ell(h(x), y) \in [0, 1] is sub-Gaussian. Use sub-Gaussian concentration (Hoeffding). This is the standard setting in learning theory and gives the cleanest bounds.

Example

Bounding quadratic forms or variance estimates

The sample variance σ^2=1ni(XiXˉ)2\hat{\sigma}^2 = \frac{1}{n}\sum_i (X_i - \bar{X})^2 involves squared terms. Each (XiXˉ)2(X_i - \bar{X})^2 is sub-exponential when XiX_i is sub-Gaussian. Use sub-exponential concentration (Bernstein-type bounds). This arises in covariance estimation, kernel methods, and random matrix theory.

Example

Analyzing inner products of random vectors

If x,yRdx, y \in \mathbb{R}^d have sub-Gaussian entries, their inner product x,y=ixiyi\langle x, y \rangle = \sum_i x_i y_i is a sum of products of sub-Gaussians, hence a sum of sub-exponential variables. Use sub-exponential concentration for the sum. This is central to compressed sensing and random projection arguments.

Common Confusions

Watch Out

Sub-exponential does not mean exponential distribution

The term "sub-exponential" means "tails no heavier than exponential." An exponential random variable is sub-exponential, but so are many others (chi-squared, squared Gaussians, products of Gaussians). The name describes a tail class, not a specific distribution.

Watch Out

The two-regime behavior of Bernstein is not a weakness

Bernstein's bound has a sub-Gaussian regime (small tt) and a sub-exponential regime (large tt). This is not a flaw. It is an accurate reflection of how sub-exponential sums actually behave. The bound is sharp in both regimes. The transition point tσ2/bt^* \approx \sigma^2/b is where the tail character changes from Gaussian to exponential.