Sub-Gaussian vs. Sub-Exponential. Tail Behavior

What Each Measures

Both sub-Gaussian and sub-exponential are tail-decay classes: they describe how fast $\Pr[|X| \geq t]$ decreases as $t \to \infty$ . A random variable is classified based on how its tails compare to specific reference distributions.

Sub-Gaussian: tails decay at least as fast as a Gaussian. Prototypical example: any bounded random variable.

Sub-Exponential: tails decay at least as fast as an exponential. Strictly heavier than sub-Gaussian, but still lighter than polynomial tails.

Side-by-Side Definitions

Definition

Sub-Gaussian Random Variable

A centered random variable $X$ is sub-Gaussian with scale $\sigma$ if and only if any of the following equivalent conditions holds, up to universal constant changes:

Tail condition: $\Pr[|X| \geq t] \leq 2\exp(-t^2/(2\sigma^2))$ for all $t \geq 0$
MGF condition: $\mathbb{E}[e^{\lambda X}] \leq \exp(\sigma^2 \lambda^2 / 2)$ for all $\lambda \in \mathbb{R}$
Moment condition: $(\mathbb{E}[|X|^p])^{1/p} \leq C\sigma\sqrt{p}$ for all $p \geq 1$

The sub-Gaussian norm is $\|X\|_{\psi_2} = \inf\{t > 0 : \mathbb{E}[e^{X^2/t^2}] \leq 2\}$ .

Definition

Sub-Exponential Random Variable

A centered random variable $X$ is sub-exponential with parameters $(\nu, b)$ if and only if any of the following equivalent conditions holds, up to universal constant changes:

Tail condition: $\Pr[|X| \geq t] \leq 2\exp(-t/b)$ for all $t$ sufficiently large
MGF condition: $\mathbb{E}[e^{\lambda X}] \leq \exp(\nu^2 \lambda^2 / 2)$ for $|\lambda| \leq 1/b$
Moment condition: $(\mathbb{E}[|X|^p])^{1/p} \leq Cbp$ for all $p \geq 1$

The sub-exponential norm is $\|X\|_{\psi_1} = \inf\{t > 0 : \mathbb{E}[e^{|X|/t}] \leq 2\}$ .

Tail Behavior: The Core Difference

The fundamental distinction is in the tail decay rate:

Property	Sub-Gaussian	Sub-Exponential
Tail decay	$\exp(-ct^2)$ for all $t$	$\exp(-ct)$ for large $t$
MGF finite	For all $\lambda \in \mathbb{R}$	Only for $\lvert\lambda\rvert \leq 1/b$
Moments	$\\|X\\|_p \sim \sigma\sqrt{p}$	$\\|X\\|_p \sim bp$
Orlicz norm	$\psi_2$ : finite $\mathbb{E}[\exp(X^2/\sigma^2)]$	$\psi_1$ : finite $\mathbb{E}[\exp(\lvert X\rvert/b)]$

The difference is sharpest in the tails. For small $t$ , both classes behave similarly (Gaussian-like). For large $t$ , sub-Gaussian tails decay quadratically in the exponent ( $e^{-ct^2}$ ), while sub-exponential tails decay only linearly ( $e^{-ct}$ ). This means sub-exponential variables have occasional large values that are much more likely than a Gaussian would predict.

Concentration for Sums

The tail classes directly determine how sums concentrate:

Proposition

Hoeffding-type Bound for Sub-Gaussian Sums

Statement

If $X_1, \ldots, X_n$ are independent, centered, sub-Gaussian with parameters $\sigma_i$ , then:

$\Pr\!\left[\left|\sum_{i=1}^n X_i\right| \geq t\right] \leq 2\exp\!\left(-\frac{t^2}{2\sum_i \sigma_i^2}\right) \quad \text{for all } t \geq 0$

The sum is also sub-Gaussian. Gaussian-quality concentration everywhere.

report a correction →

Proposition

Bernstein-type Bound for Sub-Exponential Sums

Statement

If $X_1, \ldots, X_n$ are independent, centered, sub-exponential with parameters $(\nu_i, b)$ , then:

$\Pr\!\left[\left|\sum_{i=1}^n X_i\right| \geq t\right] \leq 2\exp\!\left(-c\min\!\left(\frac{t^2}{\sum_i \nu_i^2},\; \frac{t}{b}\right)\right)$

Two regimes: sub-Gaussian ( $e^{-ct^2}$ ) for small $t$ , sub-exponential ( $e^{-ct}$ ) for large $t$ . The transition occurs at $t \approx \sum_i \nu_i^2 / b$ .

report a correction →

This is exactly the Bernstein phenomenon: sums of sub-exponential variables have Gaussian concentration near the mean and exponential concentration in the tails.

The Fundamental Relationship: Products and Squares

Proposition

Products of Sub-Gaussians Are Sub-Exponential

Statement

If $X$ and $Y$ are sub-Gaussian, then $XY$ is sub-exponential. Specifically:

$\|XY\|_{\psi_1} \leq \|X\|_{\psi_2} \cdot \|Y\|_{\psi_2}$

In particular, if $X$ is sub-Gaussian, then $X^2$ is sub-exponential.

Intuition

A sub-Gaussian variable $X$ has $\mathbb{E}[e^{X^2/\sigma^2}] < \infty$ . When you square it, $\mathbb{E}[e^{|X^2|/c}] = \mathbb{E}[e^{X^2/c}]$ , which is finite only for sufficiently large $c$ . This is the sub-exponential condition, not the sub-Gaussian one. Squaring "promotes" the tail from $\psi_2$ to $\psi_1$ : the quadratic exponent becomes linear.

report a correction →

Canonical Examples

Example

Chi-squared: sub-exponential but not sub-Gaussian

Let $Z_1, \ldots, Z_d$ be i.i.d. $\mathcal{N}(0, 1)$ . Each $Z_i$ is sub-Gaussian. The chi-squared statistic $\chi^2_d = \sum_{i=1}^d Z_i^2$ is a sum of $Z_i^2$ , each of which is sub-exponential (as the square of a sub-Gaussian). So $\chi^2_d$ is sub-exponential.

But $\chi^2_d$ is not sub-Gaussian. Its tail satisfies $\Pr[\chi^2_d - d \geq t] \approx e^{-ct}$ for large $t$ , not $e^{-ct^2}$ . The MGF $\mathbb{E}[e^{\lambda \chi^2_d}] = (1 - 2\lambda)^{-d/2}$ is finite only for $\lambda < 1/2$ , not for all $\lambda$ .

This is the prototypical example: chi-squared statistics arise everywhere in statistics and ML (e.g., quadratic forms, variance estimates, kernel evaluations), and they require sub-exponential theory.

Example

Bounded variables: sub-Gaussian

If $X \in [a, b]$ almost surely, then $X$ is sub-Gaussian with parameter $\sigma = (b - a)/2$ . This is Hoeffding's lemma. Bounded variables are the best-behaved class: they are sub-Gaussian, and Hoeffding's inequality is a special case of sub-Gaussian concentration.

Example

Exponential distribution: sub-exponential but not sub-Gaussian

If $X \sim \text{Exp}(\lambda)$ , then $X$ is sub-exponential with parameter $b = 1/\lambda$ . Its tail is $\Pr[X \geq t] = e^{-\lambda t}$ , which decays linearly in $t$ , not quadratically. The MGF $\mathbb{E}[e^{sX}] = \lambda/(\lambda - s)$ exists only for $s < \lambda$ .

The Bernstein Condition

The Bernstein condition provides a clean characterization of when a variable is sub-exponential via its moments:

Definition

Bernstein Condition

A centered random variable $X$ satisfies the Bernstein condition with parameter $b$ if and only if for all integers $k \geq 2$ :

$\mathbb{E}[|X|^k] \leq \frac{k!}{2} \cdot b^{k-2} \cdot \mathbb{E}[X^2]$

This is equivalent to being sub-exponential. The condition says that higher moments grow at most factorially (like those of an exponential distribution), not faster.

The contrast: sub-Gaussian variables have moments growing like $\mathbb{E}[|X|^k] \leq (C\sigma)^k \cdot k^{k/2}$ , which is slower than factorial. Sub-exponential variables have factorial moment growth, which is faster but still controlled.

Where Each Fails

Sub-Gaussian fails for heavy-tailed data

Many real-world distributions have heavier tails than sub-Gaussian: log-normal returns in finance, power-law degree distributions in networks, and noise in robust statistics. For these, even sub-exponential can be too restrictive, and you may need polynomial tail bounds (finite moments only) or heavy-tailed concentration tools like the median-of-means estimator.

Sub-Exponential fails for the heaviest tails

If $\mathbb{E}[e^{|X|/b}] = \infty$ for all $b$ , the variable is not sub-exponential. Pareto distributions, Cauchy distributions, and other power-law-tailed variables fall outside both classes. For these you need truncation or robust estimation techniques.

What to Memorize

	Sub-Gaussian	Sub-Exponential
Tail	$e^{-ct^2}$	$e^{-ct}$ (large $t$ )
MGF domain	All $\lambda$	$\lvert\lambda\rvert \leq 1/b$
Norm	$\psi_2$	$\psi_1$
Moment growth	$\sim \sigma\sqrt{p}$	$\sim bp$
Inclusion	Sub-Gaussian $\subset$ Sub-Exponential	(strictly larger)
Closed under	Addition, linear combinations	Addition, linear combinations
Products	$X \cdot Y$ is sub-exponential	No clean closure

Key facts to internalize:

Every sub-Gaussian variable is sub-exponential (but not vice versa)
$X$ sub-Gaussian $\Rightarrow$ $X^2$ sub-exponential
$\chi^2$ is the canonical sub-exponential-but-not-sub-Gaussian example
Bernstein's inequality is the concentration result for sub-exponential sums

When a Researcher Would Use Each

Example

Bounding the sample mean of bounded losses

The loss $\ell(h(x), y) \in [0, 1]$ is sub-Gaussian. Use sub-Gaussian concentration (Hoeffding). This is the standard setting in learning theory and gives the cleanest bounds.

Example

Bounding quadratic forms or variance estimates

The sample variance $\hat{\sigma}^2 = \frac{1}{n}\sum_i (X_i - \bar{X})^2$ involves squared terms. Each $(X_i - \bar{X})^2$ is sub-exponential when $X_i$ is sub-Gaussian. Use sub-exponential concentration (Bernstein-type bounds). This arises in covariance estimation, kernel methods, and random matrix theory.

Example

Analyzing inner products of random vectors

If $x, y \in \mathbb{R}^d$ have sub-Gaussian entries, their inner product $\langle x, y \rangle = \sum_i x_i y_i$ is a sum of products of sub-Gaussians, hence a sum of sub-exponential variables. Use sub-exponential concentration for the sum. This is central to compressed sensing and random projection arguments.

Common Confusions

Watch Out

Sub-exponential does not mean exponential distribution

The term "sub-exponential" means "tails no heavier than exponential." An exponential random variable is sub-exponential, but so are many others (chi-squared, squared Gaussians, products of Gaussians). The name describes a tail class, not a specific distribution.

Watch Out

The two-regime behavior of Bernstein is not a weakness

Bernstein's bound has a sub-Gaussian regime (small $t$ ) and a sub-exponential regime (large $t$ ). This is not a flaw. It is an accurate reflection of how sub-exponential sums actually behave. The bound is sharp in both regimes. The transition point $t^* \approx \sigma^2/b$ is where the tail character changes from Gaussian to exponential.