Skewness, Kurtosis, and Higher Moments

Sneiderman, Robby

Foundations

Skewness, Kurtosis, and Higher Moments

Distribution shape beyond mean and variance: skewness measures tail asymmetry, kurtosis measures tail extremeness, cumulants are the cleaner language, and heavy-tailed distributions break all of these.

CoreTier 1StableCore spine~55 min

Prerequisites

Common Probability Distributions Expectation Variance Covariance Moments

Start 8-question practice · 4 available Prereq Map

Learning position

Read this page in the graph.

foundations | layer 1 | tier 1. This page has 2 direct prerequisites and 3 published dependents.

Open Atlas Prerequisites Leads to

What next

Sub-Gaussian Random Variables

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Mean and variance tell you the center and spread of a distribution. They say nothing about shape. Two distributions can have identical mean and variance but completely different tail behavior, asymmetry, and outlier frequency. Skewness and kurtosis capture this shape information.

Most textbooks get the interpretation wrong. Kurtosis is not peakedness. It measures how much extreme values dominate the distribution. This page gives the correct interpretations and shows exactly when these statistics fail.

Correct Interpretations

Definition

Skewness $γ_{1}$

The third standardized moment:

$\gamma_1 = \mathbb{E}\!\left[\left(\frac{X - \mu}{\sigma}\right)^3\right]$

Correct interpretation: the third standardized moment. Positive skew means the right tail contributes more cubed-deviation mass and is the heuristic "longer tail" reading; negative skew is the mirror. Zero skewness only means the third standardized moment is zero. It does not imply symmetry of the distribution and it does not imply symmetric tails: there are asymmetric distributions whose positive and negative cubed deviations happen to cancel. Symmetric distributions have zero skewness whenever the third moment exists, but the converse fails.

Watch Out

Skewness is not about which way the distribution leans

People say "right-skewed means the distribution leans right." Wrong. Right-skewed (positive skew) means the RIGHT TAIL is longer. The bulk of the mass is actually on the LEFT. Income distributions are right-skewed: most people earn moderate amounts, but a long right tail extends to very high incomes. The mean is pulled right of the median.

Definition

Kurtosis $κ_{4}$

The fourth standardized moment:

$\kappa_4 = \mathbb{E}\!\left[\left(\frac{X - \mu}{\sigma}\right)^4\right]$

The normal distribution has kurtosis exactly 3.

Correct interpretation: how much extreme values (outliers) dominate the distribution, compared to moderate deviations. High kurtosis means rare but extreme values contribute disproportionately. It is about TAIL WEIGHT, not peak shape.

Why Kurtosis Measures Tails, Not Peaks

The $z^4$ weighting makes extreme values dominate the kurtosis integral

Watch Out

Kurtosis is NOT peakedness

This is one of the most persistent wrong claims in statistics education. Kurtosis does not measure how peaked or flat the distribution is. A distribution can be flat-topped with high kurtosis (if it has heavy tails) or peaked with low kurtosis (if it has light tails). The fourth power $z^4$ amplifies extreme z-scores. Kurtosis measures the contribution of the tails, period.

Definition

Excess Kurtosis

$\text{Excess kurtosis} = \kappa_4 - 3$

Subtracts the normal distribution's kurtosis as a baseline. The normal has excess kurtosis 0. Positive excess kurtosis means heavier tails than normal. Negative excess kurtosis (possible, minimum is $-2$ ) means lighter tails than normal.

This is not a different concept. It is just kurtosis recentered so the Gaussian baseline is zero.

Definition

Coefficient of Variation $C V$

$CV = \frac{\sigma}{\mu}, \quad \text{often written } \frac{\sigma}{|\mu|} \text{ when } \mu \text{ can be negative.}$

Relative variability compared to the mean. Dimensionless. Useful for comparing spread across different scales (e.g., comparing variability of heights in centimeters vs. inches gives the same CV). The CV is meaningful mainly for positive-scale quantities where zero has a real interpretation; for sign-changing data, the absolute-value form is the standard convention but interpretation is still fragile.

When CV is useful: positive-scale data where zero has meaning (waiting times, concentrations, demand).

When CV is garbage: mean near zero (CV explodes), data crosses zero (interpretation breaks), or variance is infinite.

When Moments Exist and When They Do Not

Higher moments require heavier integrability conditions. They fail in a strict order: the fourth moment fails before the third, the third before the second.

Distribution	Mean?	Variance?	Skewness?	Kurtosis?
Normal	Yes	Yes	Yes	Yes
Uniform	Yes	Yes	Yes	Yes
Laplace	Yes	Yes	Yes	Yes
Student $t$ , $\nu = 5$	Yes	Yes	Yes	Yes
Student $t$ , $\nu = 4$	Yes	Yes	Yes	No
Student $t$ , $\nu = 3$	Yes	Yes	No	No
Student $t$ , $\nu = 2$	Yes	No	No	No
Student $t$ , $\nu = 1$ (Cauchy)	No	No	No	No
Pareto, $\alpha > 4$	Yes	Yes	Yes	Yes
Pareto, $3 < \alpha \leq 4$	Yes	Yes	Yes	No
Pareto, $2 < \alpha \leq 3$	Yes	Yes	No	No
Pareto, $1 < \alpha \leq 2$	Yes	No	No	No
Pareto, $\alpha \leq 1$	No	No	No	No

The rule for Student $t$ : the $k$ -th moment exists only if $\nu > k$ .

The rule for Pareto: the $k$ -th raw moment exists only if $\alpha > k$ .

These two rules alone cover most cases you will encounter.

Cumulants: The Better Language

Moments mix information from lower orders into higher orders. Cumulants isolate the genuinely new information at each order.

Definition

Cumulants $κ_{k}$

Cumulants $\kappa_1, \kappa_2, \kappa_3, \ldots$ are defined through the cumulant generating function (log of the moment generating function):

$K(t) = \log M(t) = \log \mathbb{E}[e^{tX}] = \sum_{k=1}^{\infty} \kappa_k \frac{t^k}{k!}$

The first four cumulants are:

Cumulant	Value	Meaning
$\kappa_1$	Mean $\mu$	Center
$\kappa_2$	Variance $\sigma^2$	Spread
$\kappa_3$	Third central moment	Asymmetry
$\kappa_4$	Fourth central moment $- 3\sigma^4$	Tail departure from Gaussian

Theorem

Gaussian Characterization via Cumulants

Statement

Suppose the moment generating function of $X$ is finite in a neighborhood of zero, so the cumulant generating function $K(t) = \log M(t)$ is analytic at $0$ and all cumulants $\kappa_k$ are well-defined. Under this regularity assumption, $X$ is Gaussian if and only if all cumulants of order $k \geq 3$ vanish: $\kappa_k = 0$ for all $k \geq 3$ .

Without the MGF-finiteness assumption, "all cumulants vanish for $k \geq 3$ " can be ill-defined (cumulants beyond a finite order may not exist) or insufficient on its own to pin down the distribution.

Intuition

The Gaussian is the only distribution that is "pure location and scale." Every other distribution carries additional shape information in its higher cumulants. If you measure any departure from Gaussianity, it shows up as a nonzero cumulant somewhere.

Why It Matters

This theorem is why cumulants are the natural language for measuring non-Gaussianity. The third cumulant measures asymmetry departure from Gaussian. The fourth cumulant measures tail departure. Each higher cumulant captures a new independent direction of non-Gaussianity. This is the theoretical basis for tests of normality and for independent component analysis (ICA).

Failure Mode

The theorem requires existence of the MGF in a neighborhood of zero, which excludes heavy-tailed distributions. For distributions where the MGF does not exist (e.g., Student $t$ with low degrees of freedom), cumulants beyond a certain order do not exist, and the characterization cannot be applied.

report a correction →

Property	Moments	Cumulants
Easy to define	Yes	Slightly less
Easy to interpret at low order	Yes	Yes
Clean under sums of independent variables	No	Yes ( $\kappa_k$ is additive)
Redundant across orders	More	Less
Better for serious theory	Not really	Yes

The additivity property is the main reason cumulants matter: if $X$ and $Y$ are independent, then $\kappa_k(X + Y) = \kappa_k(X) + \kappa_k(Y)$ for all $k$ . Moments do not have this property beyond $k = 1$ .

Tail Probability: What Practitioners Actually Care About

Rather than memorizing kurtosis values, look at tail probabilities directly.

Distribution	$P(\lvert X \rvert > 2\sigma)$	$P(\lvert X \rvert > 3\sigma)$	Interpretation
Normal	~4.6%	~0.27%	Baseline
Student $t$ , $\nu = 5$	~6.5%	~1.0%	Heavier tails
Laplace	~6.7%	~1.2%	Heavier tails
Uniform	0%	0%	Bounded, no tail events

This table is more useful than raw kurtosis values because it shows what actually happens in practice: how often do extreme events occur?

Exercises

ExerciseCore

Problem

Compute the skewness and kurtosis of the exponential distribution with rate $\lambda = 1$ . Is it right-skewed or left-skewed?

ExerciseCore

Problem

The Cauchy distribution has no finite mean. What happens if you compute the sample mean of 1000 Cauchy observations and repeat this experiment 100 times? What do you observe about the sample means?

ExerciseAdvanced

Problem

Prove that for independent random variables $X$ and $Y$ , the cumulant of the sum equals the sum of cumulants: $\kappa_k(X + Y) = \kappa_k(X) + \kappa_k(Y)$ .

References

Canonical:

Casella & Berger, Statistical Inference (2002), Chapter 2
DeCarlo, "On the Meaning and Use of Kurtosis" (Psychological Methods, 1997). The definitive correction to the "peakedness" myth.

Current:

Westfall, "Kurtosis as Peakedness, 1905-2014. R.I.P." (The American Statistician, 2014). Comprehensive debunking.
Stuart and Ord, Kendall's Advanced Theory of Statistics, Vol. 1: Distribution Theory (1994). Standard moments-and-cumulants reference.
McCullagh, Tensor Methods in Statistics (1987). Modern cumulant/tensor-moment treatment with Bell-polynomial moment-cumulant conversion.
Joanes and Gill, "Comparing Measures of Sample Skewness and Kurtosis" (1998). finite-sample estimator conventions

Next Topics

Sub-Gaussian random variables: the tail class defined by MGF bounds, where kurtosis-like behavior is controlled globally
Concentration inequalities: when moments exist, they give tail bounds
Robust statistics: what to use when moments do not exist or are unreliable

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

2

Common Probability Distributionslayer 0A · tier 1
Expectation, Variance, Covariance, and Momentslayer 0A · tier 1

Derived topics

3

Concentration Inequalitieslayer 1 · tier 1
Sub-Gaussian Random Variableslayer 2 · tier 1
Robust Statistics and M-Estimatorslayer 3 · tier 2

Graph-backed continuations

Sub-Gaussian Random Variables Concentration Inequalities Robust Statistics and M-Estimators