Skip to main content

Foundations

Pareto Distribution

The Pareto distribution is the canonical power-law on a half-line. The Type I parameterization has survival function (x_m/x)^alpha for x at least x_m. The shape parameter alpha is the tail index. Three regimes of alpha matter for the law of large numbers and the central limit theorem: alpha at most 1 has no finite mean and breaks the LLN; 1 < alpha at most 2 has finite mean but infinite variance so the standard CLT fails (generalized CLT to a stable law); alpha greater than 2 admits both LLN and CLT in the usual form. Applications: wealth, city sizes, file sizes, network degree, insurance severity. The 80/20 'Pareto principle' is a specific case requiring alpha approximately 1.16.

AdvancedAdvancedTier 2StableSupporting~45 min
For:MLStatsActuarial

Plain-Language Definition

The Pareto distribution is the simplest model of a power-law tail. A positive random variable XX is Pareto Type I with minimum value xm>0x_m > 0 and shape parameter α>0\alpha > 0 if the probability of exceeding xx falls like a power of xx:

P(X>x)=(xmx)α,xxm.\mathbb{P}(X > x) = \left(\frac{x_m}{x}\right)^\alpha, \quad x \geq x_m.

The shape parameter α\alpha is called the tail index. A smaller α\alpha means a heavier tail, a slower decay of exceedance probabilities, and more weight in the upper extreme. The 80/20 rule, the long tail of file sizes on the internet, and the size distribution of cities and earthquakes all sit in the Pareto family with different tail indices.

The shape of the tail is what makes the Pareto interesting: depending on how heavy the tail is, the sample mean may not converge, or it may converge but to a non-Normal limit. The distinctions are sharp, controlled entirely by α\alpha.

Definition

Definition

Pareto Type I Distribution

A random variable XX has a Pareto Type I distribution with scale xm>0x_m > 0 and shape α>0\alpha > 0 when its survival function is

SX(x)=P(X>x)=(xmx)α,xxm,S_X(x) = \mathbb{P}(X > x) = \left(\frac{x_m}{x}\right)^\alpha, \quad x \geq x_m,

and SX(x)=1S_X(x) = 1 for x<xmx < x_m. The density is

fX(x)=αxmαxα+1,xxm.f_X(x) = \frac{\alpha\, x_m^\alpha}{x^{\alpha + 1}}, \quad x \geq x_m.

The support starts at xmx_m, not at 0; the distribution is left-bounded. The Type II (Lomax) parameterization shifts the support to start at 0 by replacing xx with xm+yx_m + y where y0y \geq 0; survival functions become (1+y/xm)α(1 + y/x_m)^{-\alpha}. The two share the same tail behavior but differ near the origin.

Why This Matters

The Pareto is the canonical heavy-tailed distribution in applied work for three reasons.

  1. It is the limiting tail. A consequence of the Pickands-Balkema-de Haan theorem in extreme-value theory is that exceedances of a high threshold from any distribution in the Frechet domain of attraction (i.e. with a regularly varying tail) converge to a Generalized Pareto. The Pareto Type II is the natural parametric model for threshold exceedances when the tail is power-law.

  2. It separates the three asymptotic regimes. The sample mean of iid Pareto samples follows three distinct asymptotic laws depending on α\alpha. Small α\alpha breaks the law of large numbers; intermediate α\alpha admits the law of large numbers but breaks the classical central limit theorem; large α\alpha admits both in the usual form. The Pareto is the cleanest distribution to use as a stress test for any sample-mean-based estimator.

  3. It is a useful baseline for tail-aware decisions. Wealth, city sizes, file sizes, network degree, insurance severity above a threshold, and earthquake magnitudes are all power-law-shaped over significant ranges. Reporting a sample mean for such data is misleading; the right summary is the tail index and a quantile, both of which the Pareto parameterizes directly.

The 80/20 principle ("80 percent of the wealth is held by 20 percent of the people") is a specific case of the Pareto distribution with shape α\alpha satisfying 1F(F1(0.8))F1(0.8)=0.2E[X]1 - F(F^{-1}(0.8)) \cdot F^{-1}(0.8) = 0.2 \cdot \mathbb{E}[X]. Solving for α\alpha gives α1.16\alpha \approx 1.16. Other splits (90/10, 70/30) correspond to other values of α\alpha. The "rule" is a shorthand for a single point on a continuum, not a universal law.

Survival, Mean, Variance

Theorem

Pareto Survival, Mean, and Variance

Statement

The survival function is P(X>x)=(xm/x)α\mathbb{P}(X > x) = (x_m / x)^\alpha for xxmx \geq x_m. The kk-th moment exists if and only if α>k\alpha > k, in which case E[Xk]=αxmkαk.\mathbb{E}[X^k] = \frac{\alpha\, x_m^k}{\alpha - k}. Specializing to k=1k = 1 and k=2k = 2: E[X]=αxmα1 for α>1,Var(X)=αxm2(α1)2(α2) for α>2.\mathbb{E}[X] = \frac{\alpha\, x_m}{\alpha - 1} \text{ for } \alpha > 1, \quad \operatorname{Var}(X) = \frac{\alpha\, x_m^2}{(\alpha - 1)^2 (\alpha - 2)} \text{ for } \alpha > 2. For α1\alpha \leq 1 the mean is infinite; for 1<α21 < \alpha \leq 2 the mean is finite but the variance is infinite.

Intuition

The integral defining E[Xk]\mathbb{E}[X^k] converges at infinity if and only if xkx(α+1)=xkα1x^k \cdot x^{-(\alpha + 1)} = x^{k - \alpha - 1} has an integrable tail, i.e. kα1<1k - \alpha - 1 < -1, equivalently α>k\alpha > k. Below the threshold, the integral diverges, and the moment is infinite. Above the threshold, the integral is elementary.

Proof Sketch

For α>k\alpha > k, E[Xk]=xmxkαxmα/xα+1dx=αxmαxmxkα1dx\mathbb{E}[X^k] = \int_{x_m}^\infty x^k \cdot \alpha\, x_m^\alpha / x^{\alpha + 1}\, dx = \alpha\, x_m^\alpha \int_{x_m}^\infty x^{k - \alpha - 1}\, dx. The integral evaluates to xmkα/(αk)x_m^{k - \alpha}/(\alpha - k), giving E[Xk]=αxmk/(αk)\mathbb{E}[X^k] = \alpha\, x_m^k / (\alpha - k). For αk\alpha \leq k the integrand has a non-integrable tail and the moment is infinite.

Why It Matters

The thresholds for moment existence are the central organizing principle for working with the Pareto. A statement of the form "estimate the mean of XX" requires α>1\alpha > 1; otherwise the sample mean does not estimate any well-defined population quantity. A statement involving the standard error of the sample mean requires α>2\alpha > 2; otherwise the classical CLT-based standard error is infinite and a different asymptotic framework is needed.

Failure Mode

Software libraries differ on which α\alpha they call the "shape": some use the survival exponent (our α\alpha), others use α+1\alpha + 1 (the density exponent), others use 1/α1/\alpha. Convert before plugging in. The same warning applies to academic papers: empirical-finance papers sometimes report tail exponents that differ by 1 from the parameter used by classical statistics texts.

Three Regimes for LLN and CLT

Theorem

LLN and CLT Regimes for Iid Pareto Samples

Statement

Let Xˉn=(1/n)i=1nXi\bar X_n = (1/n)\sum_{i=1}^n X_i and Sn=i=1nXiS_n = \sum_{i=1}^n X_i.

  • Regime A (α1\alpha \leq 1). The mean is infinite. Xˉn\bar X_n \to \infty almost surely. Neither the law of large numbers nor the standard central limit theorem applies. Under suitable centering and scaling, SnS_n has a stable-law limit with index α\alpha.
  • Regime B (1<α21 < \alpha \leq 2). The mean μ=αxm/(α1)\mu = \alpha\, x_m / (\alpha - 1) is finite. The variance is infinite. The law of large numbers holds: Xˉnμ\bar X_n \to \mu almost surely (by Khintchine). The classical central limit theorem fails; instead, (Snnμ)/n1/α(S_n - n\mu) / n^{1/\alpha} converges in distribution to a stable law with index α\alpha.
  • Regime C (α>2\alpha > 2). Both the mean and the variance are finite. Standard law of large numbers and classical central limit theorem apply: Xˉnμ\bar X_n \to \mu almost surely and n(Xˉnμ)dN(0,σ2)\sqrt{n}(\bar X_n - \mu) \xrightarrow{d} N(0, \sigma^2).

Intuition

The classical CLT requires finite variance; the law of large numbers requires only finite mean. Pareto α\alpha controls both thresholds simultaneously. The boundary α=2\alpha = 2 separates Normal limits from stable limits; the boundary α=1\alpha = 1 separates law-of-large-numbers behavior from no-law-of-large-numbers behavior.

Proof Sketch

The mean condition E[X]<\mathbb{E}[X] < \infty requires α>1\alpha > 1. The variance condition Var(X)<\operatorname{Var}(X) < \infty requires α>2\alpha > 2. With finite mean and variance, the standard Kolmogorov SLLN and Lindeberg CLT apply. With finite mean only, Khintchine's SLLN still gives convergence of the sample mean to the population mean almost surely. Generalized CLT theory (Gnedenko-Kolmogorov; see Feller volume 2, chapter 17) gives stable-law limits for centered partial sums whenever the tail is regularly varying with index α\alpha, which is the Pareto case.

Why It Matters

The regime boundary at α=2\alpha = 2 is the most consequential. Confidence intervals for the sample mean, tt-tests, zz-tests, and every standard-error calculation rely on the finite-variance CLT. When data is Pareto with α2\alpha \leq 2, these procedures produce intervals that shrink at the wrong rate (n1/αn^{1/\alpha} instead of n\sqrt n) and the coverage probabilities are uncontrolled in finite samples.

Failure Mode

The "median is more reliable than the mean for heavy-tailed data" advice is correct for α1\alpha \leq 1 (no finite mean exists) but the median has its own bias-variance properties that are different from the mean. For 1<α21 < \alpha \leq 2, the mean is well-defined and the sample mean converges; the slow n1/αn^{1/\alpha} rate is the problem, not the existence.

See also lln-failures-heavy-tails for the diagnostic plots that detect each regime from data.

Worked Example: Three Tail Indices

Consider Pareto Type I samples with xm=1x_m = 1 and three shape values α=0.8,1.5,3.0\alpha = 0.8, 1.5, 3.0.

For α=0.8\alpha = 0.8 (Regime A), E[X]=\mathbb{E}[X] = \infty. A simulation of n=106n = 10^6 iid samples produces a sample mean that drifts upward with nn and depends sensitively on the largest observation. Median is well-defined: q0.5=10.51/0.8=21/0.82.378q_{0.5} = 1 \cdot 0.5^{-1/0.8} = 2^{1/0.8} \approx 2.378.

For α=1.5\alpha = 1.5 (Regime B), E[X]=1.5/0.5=3\mathbb{E}[X] = 1.5 / 0.5 = 3. Sample mean converges to 3 in probability, but the rate is n1/1.5=n2/3n^{-1/1.5} = n^{-2/3}, slower than n1/2n^{-1/2}. Standard errors computed from the sample variance are meaningless; the variance is infinite.

For α=3.0\alpha = 3.0 (Regime C), E[X]=31/2=1.5\mathbb{E}[X] = 3 \cdot 1 / 2 = 1.5 and Var(X)=3/(41)=0.75\operatorname{Var}(X) = 3 / (4 \cdot 1) = 0.75. Sample mean converges at the standard n1/2n^{-1/2} rate, and n(Xˉn1.5)dN(0,0.75)\sqrt{n}(\bar X_n - 1.5) \xrightarrow{d} N(0, 0.75). Confidence intervals are conventional.

Across the three regimes, the population median is always finite: q0.5(α)=xm21/αq_{0.5}(\alpha) = x_m \cdot 2^{1/\alpha}, equal to 1.8901.890 for α=1.5\alpha = 1.5 and 1.2601.260 for α=3\alpha = 3. Median is a stable summary even when the mean is not.

Common Misconceptions

Watch Out

Pareto with alpha at most 1 has no finite mean

The sample mean of Pareto data with α1\alpha \leq 1 diverges to infinity almost surely. Reporting a sample mean from such data is meaningless; the population quantity does not exist. Use the median or a quantile-based summary instead.

Watch Out

The 80/20 rule is a single point, not a universal property

The "80/20 rule" corresponds to a Pareto with α\alpha near 1.161.16. Other splits (90/10, 70/30) correspond to other values of α\alpha. The split is a one-parameter shorthand, not a separate empirical regularity. Quoting "the 80/20 rule applies" to a data set without computing α\alpha is a common error.

Watch Out

A power-law tail and a power-law density are not the same statement

The Pareto Type I has density f(x)=αxmα/xα+1f(x) = \alpha\, x_m^\alpha / x^{\alpha + 1}, an exponent of α+1\alpha + 1 in the density. The survival function has exponent α\alpha. Papers sometimes report the density exponent and label it α\alpha; others report the survival exponent and use the same symbol. The two differ by 1. Always check which is meant.

Watch Out

Estimating alpha from a log-log plot is biased

Plotting logP(X>x)\log \mathbb{P}(X > x) against logx\log x and reading off the slope is a quick visual check, not a valid estimator. The slope estimator has systematic bias, and the empirical survival function for the largest order statistics has substantial sampling variability. Use Hill's estimator or a maximum-likelihood fit above a chosen threshold; quantify the threshold sensitivity.

Comparison: Pareto vs Exponential vs Lognormal

The three nonnegative right-skewed distributions form a useful tail-weight ladder.

  • Exponential. Tail decays as eλxe^{-\lambda x}. Light-tailed; all moments exist; standard LLN and CLT.
  • Lognormal. Tail decays sub-exponentially but super-polynomially. All moments exist; LLN and CLT hold; but tails are heavier than Exponential and conditional excess grows roughly linearly with the threshold.
  • Pareto. Tail decays polynomially as xαx^{-\alpha}. Moments exist only above α\alpha; LLN and CLT hold only for sufficiently large α\alpha.

Discriminating between these on data is the work of the mean-excess plot and the log-log survival plot. Pareto data shows a roughly horizontal mean-excess plot above some threshold; Exponential data shows a strictly horizontal mean-excess plot at every level; Lognormal data shows a curved mean-excess plot.

For the severity-modeling perspective on the Pareto, including peaks-over-threshold fitting and connections to the Generalized Pareto, see ActuaryPath's Pareto page at https://www.actuarypath.com/concepts/pareto-distribution/.

Maximum-Likelihood Estimator

For an iid Pareto Type I sample with known xmx_m, the MLE of α\alpha is

α^=ni=1nln(Xi/xm).\widehat\alpha = \frac{n}{\sum_{i=1}^{n} \ln(X_i / x_m)}.

This is the inverse of the average log-excess and is a special case of Hill's estimator. The MLE is consistent and asymptotically Normal with variance α2/n\alpha^2 / n when xmx_m is known. When xmx_m is unknown, x^m=miniXi\widehat x_m = \min_i X_i is the MLE and the MLE for α\alpha uses the same formula with the sample minimum.

Both MLEs are biased in finite samples for small α\alpha; the Hill estimator has known finite-sample bias documented in classical extreme-value theory references.

Exercises

ExerciseCore

Problem

A power-law model for the size distribution of files on a server has xm=1x_m = 1 KB and tail index α=2\alpha = 2. Compute the median file size, the mean file size, and the probability that a file exceeds 100 KB.

ExerciseCore

Problem

A Pareto Type I has xm=100x_m = 100 and α=3\alpha = 3. Find the 95th and 99th percentiles, and the conditional expectation given exceedance of 1000.

ExerciseCore

Problem

Suppose XPareto(xm=1,α=1.5)X \sim \operatorname{Pareto}(x_m = 1, \alpha = 1.5). Compute P(X>10)\mathbb{P}(X > 10) and E[X]\mathbb{E}[X], and explain why the sample variance from any iid sample is uninformative.

ExerciseAdvanced

Problem

Derive the maximum-likelihood estimator of α\alpha from an iid Pareto Type I sample with known xmx_m.

ExerciseAdvanced

Problem

Show that if XPareto(xm,α)X \sim \operatorname{Pareto}(x_m, \alpha), then Y=ln(X/xm)Y = \ln(X / x_m) is Exponential(α)\operatorname{Exponential}(\alpha).

ExerciseAdvanced

Problem

Find the value of α\alpha for which the Pareto Type I satisfies the "80/20" property: the top 20 percent of the population holds 80 percent of the total wealth.

References

  • Casella, G., and Berger, R. L. (2002). Statistical Inference, 2nd ed., Duxbury. Section 3.3 includes the Pareto in the catalog of continuous distributions; chapter 5 covers asymptotic theory and the conditions under which the CLT applies.
  • Blitzstein, J. K., and Hwang, J. (2019). Introduction to Probability, 2nd ed., Chapman and Hall / CRC. Chapter 6 has worked examples on Pareto wealth distributions and the LLN failure.
  • For peaks-over-threshold fitting, Generalized Pareto modeling, and the actuarial-severity perspective, see ActuaryPath's Pareto page at https://www.actuarypath.com/concepts/pareto-distribution/ and Klugman, Panjer, Willmot (2019), Loss Models, 5th ed., Wiley, Chapter 5.
  • For the stable-law limit theorems referenced in Regime B, Feller, W. (1971). An Introduction to Probability Theory and Its Applications, Volume 2, 2nd ed., Wiley. Chapter 17 covers stable laws and generalized central limit theorems.

Last reviewed: May 12, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

4

Derived topics

0

No published topic currently declares this as a prerequisite.