Foundations
Pareto Distribution
The Pareto distribution is the canonical power-law on a half-line. The Type I parameterization has survival function (x_m/x)^alpha for x at least x_m. The shape parameter alpha is the tail index. Three regimes of alpha matter for the law of large numbers and the central limit theorem: alpha at most 1 has no finite mean and breaks the LLN; 1 < alpha at most 2 has finite mean but infinite variance so the standard CLT fails (generalized CLT to a stable law); alpha greater than 2 admits both LLN and CLT in the usual form. Applications: wealth, city sizes, file sizes, network degree, insurance severity. The 80/20 'Pareto principle' is a specific case requiring alpha approximately 1.16.
Prerequisites
Plain-Language Definition
The Pareto distribution is the simplest model of a power-law tail. A positive random variable is Pareto Type I with minimum value and shape parameter if the probability of exceeding falls like a power of :
The shape parameter is called the tail index. A smaller means a heavier tail, a slower decay of exceedance probabilities, and more weight in the upper extreme. The 80/20 rule, the long tail of file sizes on the internet, and the size distribution of cities and earthquakes all sit in the Pareto family with different tail indices.
The shape of the tail is what makes the Pareto interesting: depending on how heavy the tail is, the sample mean may not converge, or it may converge but to a non-Normal limit. The distinctions are sharp, controlled entirely by .
Definition
Pareto Type I Distribution
A random variable has a Pareto Type I distribution with scale and shape when its survival function is
and for . The density is
The support starts at , not at 0; the distribution is left-bounded. The Type II (Lomax) parameterization shifts the support to start at 0 by replacing with where ; survival functions become . The two share the same tail behavior but differ near the origin.
Why This Matters
The Pareto is the canonical heavy-tailed distribution in applied work for three reasons.
-
It is the limiting tail. A consequence of the Pickands-Balkema-de Haan theorem in extreme-value theory is that exceedances of a high threshold from any distribution in the Frechet domain of attraction (i.e. with a regularly varying tail) converge to a Generalized Pareto. The Pareto Type II is the natural parametric model for threshold exceedances when the tail is power-law.
-
It separates the three asymptotic regimes. The sample mean of iid Pareto samples follows three distinct asymptotic laws depending on . Small breaks the law of large numbers; intermediate admits the law of large numbers but breaks the classical central limit theorem; large admits both in the usual form. The Pareto is the cleanest distribution to use as a stress test for any sample-mean-based estimator.
-
It is a useful baseline for tail-aware decisions. Wealth, city sizes, file sizes, network degree, insurance severity above a threshold, and earthquake magnitudes are all power-law-shaped over significant ranges. Reporting a sample mean for such data is misleading; the right summary is the tail index and a quantile, both of which the Pareto parameterizes directly.
The 80/20 principle ("80 percent of the wealth is held by 20 percent of the people") is a specific case of the Pareto distribution with shape satisfying . Solving for gives . Other splits (90/10, 70/30) correspond to other values of . The "rule" is a shorthand for a single point on a continuum, not a universal law.
Survival, Mean, Variance
Pareto Survival, Mean, and Variance
Statement
The survival function is for . The -th moment exists if and only if , in which case Specializing to and : For the mean is infinite; for the mean is finite but the variance is infinite.
Intuition
The integral defining converges at infinity if and only if has an integrable tail, i.e. , equivalently . Below the threshold, the integral diverges, and the moment is infinite. Above the threshold, the integral is elementary.
Proof Sketch
For , . The integral evaluates to , giving . For the integrand has a non-integrable tail and the moment is infinite.
Why It Matters
The thresholds for moment existence are the central organizing principle for working with the Pareto. A statement of the form "estimate the mean of " requires ; otherwise the sample mean does not estimate any well-defined population quantity. A statement involving the standard error of the sample mean requires ; otherwise the classical CLT-based standard error is infinite and a different asymptotic framework is needed.
Failure Mode
Software libraries differ on which they call the "shape": some use the survival exponent (our ), others use (the density exponent), others use . Convert before plugging in. The same warning applies to academic papers: empirical-finance papers sometimes report tail exponents that differ by 1 from the parameter used by classical statistics texts.
Three Regimes for LLN and CLT
LLN and CLT Regimes for Iid Pareto Samples
Statement
Let and .
- Regime A (). The mean is infinite. almost surely. Neither the law of large numbers nor the standard central limit theorem applies. Under suitable centering and scaling, has a stable-law limit with index .
- Regime B (). The mean is finite. The variance is infinite. The law of large numbers holds: almost surely (by Khintchine). The classical central limit theorem fails; instead, converges in distribution to a stable law with index .
- Regime C (). Both the mean and the variance are finite. Standard law of large numbers and classical central limit theorem apply: almost surely and .
Intuition
The classical CLT requires finite variance; the law of large numbers requires only finite mean. Pareto controls both thresholds simultaneously. The boundary separates Normal limits from stable limits; the boundary separates law-of-large-numbers behavior from no-law-of-large-numbers behavior.
Proof Sketch
The mean condition requires . The variance condition requires . With finite mean and variance, the standard Kolmogorov SLLN and Lindeberg CLT apply. With finite mean only, Khintchine's SLLN still gives convergence of the sample mean to the population mean almost surely. Generalized CLT theory (Gnedenko-Kolmogorov; see Feller volume 2, chapter 17) gives stable-law limits for centered partial sums whenever the tail is regularly varying with index , which is the Pareto case.
Why It Matters
The regime boundary at is the most consequential. Confidence intervals for the sample mean, -tests, -tests, and every standard-error calculation rely on the finite-variance CLT. When data is Pareto with , these procedures produce intervals that shrink at the wrong rate ( instead of ) and the coverage probabilities are uncontrolled in finite samples.
Failure Mode
The "median is more reliable than the mean for heavy-tailed data" advice is correct for (no finite mean exists) but the median has its own bias-variance properties that are different from the mean. For , the mean is well-defined and the sample mean converges; the slow rate is the problem, not the existence.
See also lln-failures-heavy-tails for the diagnostic plots that detect each regime from data.
Worked Example: Three Tail Indices
Consider Pareto Type I samples with and three shape values .
For (Regime A), . A simulation of iid samples produces a sample mean that drifts upward with and depends sensitively on the largest observation. Median is well-defined: .
For (Regime B), . Sample mean converges to 3 in probability, but the rate is , slower than . Standard errors computed from the sample variance are meaningless; the variance is infinite.
For (Regime C), and . Sample mean converges at the standard rate, and . Confidence intervals are conventional.
Across the three regimes, the population median is always finite: , equal to for and for . Median is a stable summary even when the mean is not.
Common Misconceptions
Pareto with alpha at most 1 has no finite mean
The sample mean of Pareto data with diverges to infinity almost surely. Reporting a sample mean from such data is meaningless; the population quantity does not exist. Use the median or a quantile-based summary instead.
The 80/20 rule is a single point, not a universal property
The "80/20 rule" corresponds to a Pareto with near . Other splits (90/10, 70/30) correspond to other values of . The split is a one-parameter shorthand, not a separate empirical regularity. Quoting "the 80/20 rule applies" to a data set without computing is a common error.
A power-law tail and a power-law density are not the same statement
The Pareto Type I has density , an exponent of in the density. The survival function has exponent . Papers sometimes report the density exponent and label it ; others report the survival exponent and use the same symbol. The two differ by 1. Always check which is meant.
Estimating alpha from a log-log plot is biased
Plotting against and reading off the slope is a quick visual check, not a valid estimator. The slope estimator has systematic bias, and the empirical survival function for the largest order statistics has substantial sampling variability. Use Hill's estimator or a maximum-likelihood fit above a chosen threshold; quantify the threshold sensitivity.
Comparison: Pareto vs Exponential vs Lognormal
The three nonnegative right-skewed distributions form a useful tail-weight ladder.
- Exponential. Tail decays as . Light-tailed; all moments exist; standard LLN and CLT.
- Lognormal. Tail decays sub-exponentially but super-polynomially. All moments exist; LLN and CLT hold; but tails are heavier than Exponential and conditional excess grows roughly linearly with the threshold.
- Pareto. Tail decays polynomially as . Moments exist only above ; LLN and CLT hold only for sufficiently large .
Discriminating between these on data is the work of the mean-excess plot and the log-log survival plot. Pareto data shows a roughly horizontal mean-excess plot above some threshold; Exponential data shows a strictly horizontal mean-excess plot at every level; Lognormal data shows a curved mean-excess plot.
For the severity-modeling perspective on the Pareto, including peaks-over-threshold fitting and connections to the Generalized Pareto, see ActuaryPath's Pareto page at https://www.actuarypath.com/concepts/pareto-distribution/.
Maximum-Likelihood Estimator
For an iid Pareto Type I sample with known , the MLE of is
This is the inverse of the average log-excess and is a special case of Hill's estimator. The MLE is consistent and asymptotically Normal with variance when is known. When is unknown, is the MLE and the MLE for uses the same formula with the sample minimum.
Both MLEs are biased in finite samples for small ; the Hill estimator has known finite-sample bias documented in classical extreme-value theory references.
Exercises
Problem
A power-law model for the size distribution of files on a server has KB and tail index . Compute the median file size, the mean file size, and the probability that a file exceeds 100 KB.
Problem
A Pareto Type I has and . Find the 95th and 99th percentiles, and the conditional expectation given exceedance of 1000.
Problem
Suppose . Compute and , and explain why the sample variance from any iid sample is uninformative.
Problem
Derive the maximum-likelihood estimator of from an iid Pareto Type I sample with known .
Problem
Show that if , then is .
Problem
Find the value of for which the Pareto Type I satisfies the "80/20" property: the top 20 percent of the population holds 80 percent of the total wealth.
References
- Casella, G., and Berger, R. L. (2002). Statistical Inference, 2nd ed., Duxbury. Section 3.3 includes the Pareto in the catalog of continuous distributions; chapter 5 covers asymptotic theory and the conditions under which the CLT applies.
- Blitzstein, J. K., and Hwang, J. (2019). Introduction to Probability, 2nd ed., Chapman and Hall / CRC. Chapter 6 has worked examples on Pareto wealth distributions and the LLN failure.
- For peaks-over-threshold fitting, Generalized Pareto modeling, and the actuarial-severity perspective, see ActuaryPath's Pareto page at https://www.actuarypath.com/concepts/pareto-distribution/ and Klugman, Panjer, Willmot (2019), Loss Models, 5th ed., Wiley, Chapter 5.
- For the stable-law limit theorems referenced in Regime B, Feller, W. (1971). An Introduction to Probability Theory and Its Applications, Volume 2, 2nd ed., Wiley. Chapter 17 covers stable laws and generalized central limit theorems.
Last reviewed: May 12, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
4- Common Probability Distributionslayer 0A · tier 1
- Distributions Atlaslayer 0A · tier 1
- Central Limit Theoremlayer 0B · tier 1
- Law of Large Numberslayer 0B · tier 1
Derived topics
0No published topic currently declares this as a prerequisite.