Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Foundations

Skewness, Kurtosis, and Higher Moments

Distribution shape beyond mean and variance: skewness measures tail asymmetry, kurtosis measures tail extremeness, cumulants are the cleaner language, and heavy-tailed distributions break all of these.

CoreTier 1Stable~55 min

Why This Matters

Mean and variance tell you the center and spread of a distribution. They say nothing about shape. Two distributions can have identical mean and variance but completely different tail behavior, asymmetry, and outlier frequency. Skewness and kurtosis capture this shape information.

Most textbooks get the interpretation wrong. Kurtosis is not peakedness. It measures how much extreme values dominate the distribution. This page gives the correct interpretations and shows exactly when these statistics fail.

Correct Interpretations

Definition

Skewness

The third standardized moment:

γ1=E ⁣[(Xμσ)3]\gamma_1 = \mathbb{E}\!\left[\left(\frac{X - \mu}{\sigma}\right)^3\right]

Correct interpretation: which tail has more influence on the distribution. Positive skew means the right tail is longer and dominates. Negative skew means the left tail dominates. Zero skew means the tails are symmetric (but does not mean the distribution is symmetric in general).

Watch Out

Skewness is not about which way the distribution leans

People say "right-skewed means the distribution leans right." Wrong. Right-skewed (positive skew) means the RIGHT TAIL is longer. The bulk of the mass is actually on the LEFT. Income distributions are right-skewed: most people earn moderate amounts, but a long right tail extends to very high incomes. The mean is pulled right of the median.

Definition

Kurtosis

The fourth standardized moment:

κ4=E ⁣[(Xμσ)4]\kappa_4 = \mathbb{E}\!\left[\left(\frac{X - \mu}{\sigma}\right)^4\right]

The normal distribution has kurtosis exactly 3.

Correct interpretation: how much extreme values (outliers) dominate the distribution, compared to moderate deviations. High kurtosis means rare but extreme values contribute disproportionately. It is about TAIL WEIGHT, not peak shape.

Why Kurtosis Measures Tails, Not Peaks

The $z^4$ weighting makes extreme values dominate the kurtosis integral

-3σ-2σ-1σ0σ1σ2σ3σtail contributionnear-zero contributionz⁴ weightz-score (standard deviations from mean)
Watch Out

Kurtosis is NOT peakedness

This is one of the most persistent wrong claims in statistics education. Kurtosis does not measure how peaked or flat the distribution is. A distribution can be flat-topped with high kurtosis (if it has heavy tails) or peaked with low kurtosis (if it has light tails). The fourth power z4z^4 amplifies extreme z-scores. Kurtosis measures the contribution of the tails, period.

Definition

Excess Kurtosis

Excess kurtosis=κ43\text{Excess kurtosis} = \kappa_4 - 3

Subtracts the normal distribution's kurtosis as a baseline. The normal has excess kurtosis 0. Positive excess kurtosis means heavier tails than normal. Negative excess kurtosis (possible, minimum is 2-2) means lighter tails than normal.

This is not a different concept. It is just kurtosis recentered so the Gaussian baseline is zero.

Definition

Coefficient of Variation

CV=σμCV = \frac{\sigma}{\mu}

Relative variability compared to the mean. Dimensionless. Useful for comparing spread across different scales (e.g., comparing variability of heights in centimeters vs. inches gives the same CV).

When CV is useful: positive-scale data where zero has meaning (waiting times, concentrations, demand).

When CV is garbage: mean near zero (CV explodes), data crosses zero (interpretation breaks), or variance is infinite.

When Moments Exist and When They Do Not

Higher moments require heavier integrability conditions. They fail in a strict order: the fourth moment fails before the third, the third before the second.

DistributionMean?Variance?Skewness?Kurtosis?
NormalYesYesYesYes
UniformYesYesYesYes
LaplaceYesYesYesYes
Student tt, ν=5\nu = 5YesYesYesNo
Student tt, ν=3\nu = 3YesYesNoNo
Student tt, ν=2\nu = 2YesNoNoNo
Student tt, ν=1\nu = 1 (Cauchy)NoNoNoNo
Pareto, α>4\alpha > 4YesYesYesYes
Pareto, 3<α43 < \alpha \leq 4YesYesYesNo
Pareto, 2<α32 < \alpha \leq 3YesYesNoNo
Pareto, 1<α21 < \alpha \leq 2YesNoNoNo
Pareto, α1\alpha \leq 1NoNoNoNo

The rule for Student tt: the kk-th moment exists only if ν>k\nu > k.

The rule for Pareto: the kk-th raw moment exists only if α>k\alpha > k.

These two rules alone cover most cases you will encounter.

Cumulants: The Better Language

Moments mix information from lower orders into higher orders. Cumulants isolate the genuinely new information at each order.

Definition

Cumulants

Cumulants κ1,κ2,κ3,\kappa_1, \kappa_2, \kappa_3, \ldots are defined through the cumulant generating function (log of the moment generating function):

K(t)=logM(t)=logE[etX]=k=1κktkk!K(t) = \log M(t) = \log \mathbb{E}[e^{tX}] = \sum_{k=1}^{\infty} \kappa_k \frac{t^k}{k!}

The first four cumulants are:

CumulantValueMeaning
κ1\kappa_1Mean μ\muCenter
κ2\kappa_2Variance σ2\sigma^2Spread
κ3\kappa_3Third central momentAsymmetry
κ4\kappa_4Fourth central moment 3σ4- 3\sigma^4Tail departure from Gaussian
Theorem

Gaussian Characterization via Cumulants

Statement

A random variable XX is Gaussian if and only if all cumulants of order k3k \geq 3 are zero: κk=0\kappa_k = 0 for all k3k \geq 3.

Intuition

The Gaussian is the only distribution that is "pure location and scale." Every other distribution carries additional shape information in its higher cumulants. If you measure any departure from Gaussianity, it shows up as a nonzero cumulant somewhere.

Why It Matters

This theorem is why cumulants are the natural language for measuring non-Gaussianity. The third cumulant measures asymmetry departure from Gaussian. The fourth cumulant measures tail departure. Each higher cumulant captures a new independent direction of non-Gaussianity. This is the theoretical basis for tests of normality and for independent component analysis (ICA).

Failure Mode

The theorem requires existence of the MGF in a neighborhood of zero, which excludes heavy-tailed distributions. For distributions where the MGF does not exist (e.g., Student tt with low degrees of freedom), cumulants beyond a certain order do not exist, and the characterization cannot be applied.

PropertyMomentsCumulants
Easy to defineYesSlightly less
Easy to interpret at low orderYesYes
Clean under sums of independent variablesNoYes (κk\kappa_k is additive)
Redundant across ordersMoreLess
Better for serious theoryNot reallyYes

The additivity property is the main reason cumulants matter: if XX and YY are independent, then κk(X+Y)=κk(X)+κk(Y)\kappa_k(X + Y) = \kappa_k(X) + \kappa_k(Y) for all kk. Moments do not have this property beyond k=1k = 1.

Tail Probability: What Practitioners Actually Care About

Rather than memorizing kurtosis values, look at tail probabilities directly.

DistributionP(X>2σ)P(\lvert X \rvert > 2\sigma)P(X>3σ)P(\lvert X \rvert > 3\sigma)Interpretation
Normal~4.6%~0.27%Baseline
Student tt, ν=5\nu = 5~6.5%~1.0%Heavier tails
Laplace~6.7%~1.2%Heavier tails
Uniform0%0%Bounded, no tail events

This table is more useful than raw kurtosis values because it shows what actually happens in practice: how often do extreme events occur?

Exercises

ExerciseCore

Problem

Compute the skewness and kurtosis of the exponential distribution with rate λ=1\lambda = 1. Is it right-skewed or left-skewed?

ExerciseCore

Problem

The Cauchy distribution has no finite mean. What happens if you compute the sample mean of 1000 Cauchy observations and repeat this experiment 100 times? What do you observe about the sample means?

ExerciseAdvanced

Problem

Prove that for independent random variables XX and YY, the cumulant of the sum equals the sum of cumulants: κk(X+Y)=κk(X)+κk(Y)\kappa_k(X + Y) = \kappa_k(X) + \kappa_k(Y).

References

Canonical:

  • Casella & Berger, Statistical Inference (2002), Chapter 2
  • DeCarlo, "On the Meaning and Use of Kurtosis" (Psychological Methods, 1997). The definitive correction to the "peakedness" myth.

Current:

  • Westfall, "Kurtosis as Peakedness, 1905-2014. R.I.P." (The American Statistician, 2014). Comprehensive debunking.

  • Munkres, Topology (2000), Chapter 1 (set theory review)

Next Topics

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Next Topics