Foundations
Skewness, Kurtosis, and Higher Moments
Distribution shape beyond mean and variance: skewness measures tail asymmetry, kurtosis measures tail extremeness, cumulants are the cleaner language, and heavy-tailed distributions break all of these.
Why This Matters
Mean and variance tell you the center and spread of a distribution. They say nothing about shape. Two distributions can have identical mean and variance but completely different tail behavior, asymmetry, and outlier frequency. Skewness and kurtosis capture this shape information.
Most textbooks get the interpretation wrong. Kurtosis is not peakedness. It measures how much extreme values dominate the distribution. This page gives the correct interpretations and shows exactly when these statistics fail.
Correct Interpretations
Skewness
The third standardized moment:
Correct interpretation: which tail has more influence on the distribution. Positive skew means the right tail is longer and dominates. Negative skew means the left tail dominates. Zero skew means the tails are symmetric (but does not mean the distribution is symmetric in general).
Skewness is not about which way the distribution leans
People say "right-skewed means the distribution leans right." Wrong. Right-skewed (positive skew) means the RIGHT TAIL is longer. The bulk of the mass is actually on the LEFT. Income distributions are right-skewed: most people earn moderate amounts, but a long right tail extends to very high incomes. The mean is pulled right of the median.
Kurtosis
The fourth standardized moment:
The normal distribution has kurtosis exactly 3.
Correct interpretation: how much extreme values (outliers) dominate the distribution, compared to moderate deviations. High kurtosis means rare but extreme values contribute disproportionately. It is about TAIL WEIGHT, not peak shape.
Why Kurtosis Measures Tails, Not Peaks
The $z^4$ weighting makes extreme values dominate the kurtosis integral
Kurtosis is NOT peakedness
This is one of the most persistent wrong claims in statistics education. Kurtosis does not measure how peaked or flat the distribution is. A distribution can be flat-topped with high kurtosis (if it has heavy tails) or peaked with low kurtosis (if it has light tails). The fourth power amplifies extreme z-scores. Kurtosis measures the contribution of the tails, period.
Excess Kurtosis
Subtracts the normal distribution's kurtosis as a baseline. The normal has excess kurtosis 0. Positive excess kurtosis means heavier tails than normal. Negative excess kurtosis (possible, minimum is ) means lighter tails than normal.
This is not a different concept. It is just kurtosis recentered so the Gaussian baseline is zero.
Coefficient of Variation
Relative variability compared to the mean. Dimensionless. Useful for comparing spread across different scales (e.g., comparing variability of heights in centimeters vs. inches gives the same CV).
When CV is useful: positive-scale data where zero has meaning (waiting times, concentrations, demand).
When CV is garbage: mean near zero (CV explodes), data crosses zero (interpretation breaks), or variance is infinite.
When Moments Exist and When They Do Not
Higher moments require heavier integrability conditions. They fail in a strict order: the fourth moment fails before the third, the third before the second.
| Distribution | Mean? | Variance? | Skewness? | Kurtosis? |
|---|---|---|---|---|
| Normal | Yes | Yes | Yes | Yes |
| Uniform | Yes | Yes | Yes | Yes |
| Laplace | Yes | Yes | Yes | Yes |
| Student , | Yes | Yes | Yes | No |
| Student , | Yes | Yes | No | No |
| Student , | Yes | No | No | No |
| Student , (Cauchy) | No | No | No | No |
| Pareto, | Yes | Yes | Yes | Yes |
| Pareto, | Yes | Yes | Yes | No |
| Pareto, | Yes | Yes | No | No |
| Pareto, | Yes | No | No | No |
| Pareto, | No | No | No | No |
The rule for Student : the -th moment exists only if .
The rule for Pareto: the -th raw moment exists only if .
These two rules alone cover most cases you will encounter.
Cumulants: The Better Language
Moments mix information from lower orders into higher orders. Cumulants isolate the genuinely new information at each order.
Cumulants
Cumulants are defined through the cumulant generating function (log of the moment generating function):
The first four cumulants are:
| Cumulant | Value | Meaning |
|---|---|---|
| Mean | Center | |
| Variance | Spread | |
| Third central moment | Asymmetry | |
| Fourth central moment | Tail departure from Gaussian |
Gaussian Characterization via Cumulants
Statement
A random variable is Gaussian if and only if all cumulants of order are zero: for all .
Intuition
The Gaussian is the only distribution that is "pure location and scale." Every other distribution carries additional shape information in its higher cumulants. If you measure any departure from Gaussianity, it shows up as a nonzero cumulant somewhere.
Why It Matters
This theorem is why cumulants are the natural language for measuring non-Gaussianity. The third cumulant measures asymmetry departure from Gaussian. The fourth cumulant measures tail departure. Each higher cumulant captures a new independent direction of non-Gaussianity. This is the theoretical basis for tests of normality and for independent component analysis (ICA).
Failure Mode
The theorem requires existence of the MGF in a neighborhood of zero, which excludes heavy-tailed distributions. For distributions where the MGF does not exist (e.g., Student with low degrees of freedom), cumulants beyond a certain order do not exist, and the characterization cannot be applied.
| Property | Moments | Cumulants |
|---|---|---|
| Easy to define | Yes | Slightly less |
| Easy to interpret at low order | Yes | Yes |
| Clean under sums of independent variables | No | Yes ( is additive) |
| Redundant across orders | More | Less |
| Better for serious theory | Not really | Yes |
The additivity property is the main reason cumulants matter: if and are independent, then for all . Moments do not have this property beyond .
Tail Probability: What Practitioners Actually Care About
Rather than memorizing kurtosis values, look at tail probabilities directly.
| Distribution | Interpretation | ||
|---|---|---|---|
| Normal | ~4.6% | ~0.27% | Baseline |
| Student , | ~6.5% | ~1.0% | Heavier tails |
| Laplace | ~6.7% | ~1.2% | Heavier tails |
| Uniform | 0% | 0% | Bounded, no tail events |
This table is more useful than raw kurtosis values because it shows what actually happens in practice: how often do extreme events occur?
Exercises
Problem
Compute the skewness and kurtosis of the exponential distribution with rate . Is it right-skewed or left-skewed?
Problem
The Cauchy distribution has no finite mean. What happens if you compute the sample mean of 1000 Cauchy observations and repeat this experiment 100 times? What do you observe about the sample means?
Problem
Prove that for independent random variables and , the cumulant of the sum equals the sum of cumulants: .
References
Canonical:
- Casella & Berger, Statistical Inference (2002), Chapter 2
- DeCarlo, "On the Meaning and Use of Kurtosis" (Psychological Methods, 1997). The definitive correction to the "peakedness" myth.
Current:
-
Westfall, "Kurtosis as Peakedness, 1905-2014. R.I.P." (The American Statistician, 2014). Comprehensive debunking.
-
Munkres, Topology (2000), Chapter 1 (set theory review)
Next Topics
- Sub-Gaussian random variables: the tail class defined by MGF bounds, where kurtosis-like behavior is controlled globally
- Concentration inequalities: when moments exist, they give tail bounds
- Robust statistics: what to use when moments do not exist or are unreliable
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.