Statistical Foundations
Order Statistics
Order statistics are the sorted values of a random sample. Their distributions govern quantile estimation, confidence intervals for medians, and the behavior of extremes.
Prerequisites
Why This Matters
Order statistics appear whenever you sort data. The sample median, percentiles, confidence intervals for quantiles, and extreme values are all functions of order statistics. In ML, the maximum of subgaussian random variables controls uniform convergence bounds. Understanding order statistic distributions is required for nonparametric inference and bootstrap theory.
Mental Model
Given i.i.d. random variables , sort them to get . The value is the -th order statistic. The minimum is , the maximum is , and the sample median is . Each order statistic has its own distribution, derived from the parent distribution.
Core Definitions
Order Statistic
Given a random sample from a continuous distribution , the -th order statistic is the -th smallest value in the sample. Formally, is the sorted rearrangement of .
Sample Quantile
The sample -quantile is , the order statistic at position . The sample median is . Sample quartiles are and . The interquartile range (IQR) is .
Main Theorems
Density of the k-th Order Statistic
Statement
The probability density function of the -th order statistic from a sample of size drawn i.i.d. from a continuous distribution with pdf and CDF is:
Intuition
For to have a value near : exactly of the samples must fall below (probability each), exactly must fall above (probability each), and one sample must be at (density ). The multinomial coefficient counts the number of ways to assign these roles.
Proof Sketch
The event is equivalent to "at least of the samples fall at or below ." So . Differentiating this CDF with respect to (using the telescoping property of adjacent binomial terms) yields the density formula.
Why It Matters
This formula is the starting point for deriving confidence intervals for population quantiles, the distribution of the sample range , and the asymptotic distribution of sample quantiles. For the uniform distribution on , it simplifies to a Beta distribution: .
Failure Mode
The formula requires a continuous parent distribution. For discrete distributions, ties occur with positive probability and the density formula does not apply directly. A separate treatment using probability mass functions is needed.
Special Cases
Minimum and Maximum
The CDF of the maximum is : all samples must fall below . The CDF of the minimum is : at least one sample must fall below .
For : and .
Uniform Order Statistics
If , then . This gives:
The joint density of all uniform order statistics is on the simplex .
Connection to Confidence Intervals for Quantiles
To construct a distribution-free confidence interval for the population -quantile : the interval contains with probability:
This is exact and distribution-free: it holds for any continuous . No parametric assumptions needed.
Connection to Concentration: Maximum of Subgaussian RVs
Maximum of Subgaussian Random Variables
Statement
If are independent sub-Gaussian random variables with parameter (meaning for all ), then:
Intuition
The maximum grows as , not . Light-tailed variables keep the maximum controlled. This is why union bounds over hypotheses cost only in the exponent.
Proof Sketch
For any : by Jensen. Then . So . Optimize over : set .
Why It Matters
This bound controls the supremum of empirical processes. In learning theory, the "hypothesis" index plays the role of , and the empirical risk deviation plays the role of . The rate explains why finite hypothesis classes need only samples for uniform convergence.
Failure Mode
For heavy-tailed distributions, the maximum can grow much faster than . For Pareto-distributed variables with tail index , the maximum grows polynomially in , not logarithmically. The sub-Gaussian assumption is critical.
Connection to Bootstrap
The bootstrap resamples with replacement from the empirical distribution. The order statistics of the bootstrap sample induce a distribution on sample quantiles. The bootstrap estimate of the sampling distribution of is consistent under mild regularity conditions (the population quantile function must be differentiable at ). This provides confidence intervals for quantiles without assuming a parametric model.
Canonical Examples
Confidence interval for the median with n = 20
With observations, we want a 95% confidence interval for the population median . The interval has coverage probability . This is valid for any continuous distribution. No normality assumption needed.
Distribution of the sample maximum from Uniform(0,1)
For i.i.d. Uniform variables: for . The density is . The expected maximum is , which approaches 1 as . The variance is , which decreases as .
Common Confusions
Order statistics are not independent
Even though are independent, the order statistics are not. Knowing tells you . The joint distribution has a constrained support: .
Sample quantiles are random variables, not parameters
The sample -quantile is a statistic computed from data. The population -quantile is a fixed parameter. The sample quantile estimates the population quantile and has its own sampling distribution.
The median is not always better than the mean
The sample median has breakdown point (robust to outliers) while the mean has breakdown point . But for Gaussian data, the mean has smaller variance than the median by a factor of . Robustness comes at a cost in efficiency when the data is actually Gaussian.
Summary
- has density involving
- Uniform order statistics follow Beta distributions
- Confidence intervals for quantiles are distribution-free
- Maximum of sub-Gaussian variables grows as
- Order statistics are dependent even when the original sample is i.i.d.
Exercises
Problem
Let be i.i.d. Uniform. Compute and .
Problem
Derive the asymptotic distribution of the sample median for a sample from a continuous distribution with density that is positive at the population median .
References
Canonical:
- David & Nagaraja, Order Statistics (2003), Chapters 2-4
- Casella & Berger, Statistical Inference (2002), Chapter 5.4
Current:
-
Boucheron, Lugosi, Massart, Concentration Inequalities (2013), Chapter 3
-
van der Vaart, Asymptotic Statistics (1998), Chapter 21
-
Lehmann & Casella, Theory of Point Estimation (1998), Chapters 1-6
-
van der Vaart, Asymptotic Statistics (1998), Chapters 2-8
Next Topics
- Extreme value theory: limiting distributions of as
- Bootstrap: resampling-based inference using order statistics
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Common Probability DistributionsLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
Builds on This
- Extreme Value TheoryLayer 3