Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Statistical Foundations

Order Statistics

Order statistics are the sorted values of a random sample. Their distributions govern quantile estimation, confidence intervals for medians, and the behavior of extremes.

CoreTier 2Stable~45 min
0

Why This Matters

Order statistics appear whenever you sort data. The sample median, percentiles, confidence intervals for quantiles, and extreme values are all functions of order statistics. In ML, the maximum of subgaussian random variables controls uniform convergence bounds. Understanding order statistic distributions is required for nonparametric inference and bootstrap theory.

Mental Model

Given nn i.i.d. random variables X1,,XnX_1, \ldots, X_n, sort them to get X(1)X(2)X(n)X_{(1)} \leq X_{(2)} \leq \cdots \leq X_{(n)}. The value X(k)X_{(k)} is the kk-th order statistic. The minimum is X(1)X_{(1)}, the maximum is X(n)X_{(n)}, and the sample median is X(n/2)X_{(\lceil n/2 \rceil)}. Each order statistic has its own distribution, derived from the parent distribution.

Core Definitions

Definition

Order Statistic

Given a random sample X1,,XnX_1, \ldots, X_n from a continuous distribution FF, the kk-th order statistic X(k)X_{(k)} is the kk-th smallest value in the sample. Formally, X(1)X(2)X(n)X_{(1)} \leq X_{(2)} \leq \cdots \leq X_{(n)} is the sorted rearrangement of X1,,XnX_1, \ldots, X_n.

Definition

Sample Quantile

The sample pp-quantile is X(np)X_{(\lceil np \rceil)}, the order statistic at position np\lceil np \rceil. The sample median is Q^(0.5)\hat{Q}(0.5). Sample quartiles are Q^(0.25)\hat{Q}(0.25) and Q^(0.75)\hat{Q}(0.75). The interquartile range (IQR) is Q^(0.75)Q^(0.25)\hat{Q}(0.75) - \hat{Q}(0.25).

Main Theorems

Theorem

Density of the k-th Order Statistic

Statement

The probability density function of the kk-th order statistic X(k)X_{(k)} from a sample of size nn drawn i.i.d. from a continuous distribution with pdf ff and CDF FF is:

fX(k)(x)=n!(k1)!(nk)![F(x)]k1[1F(x)]nkf(x)f_{X_{(k)}}(x) = \frac{n!}{(k-1)!(n-k)!} [F(x)]^{k-1} [1 - F(x)]^{n-k} f(x)

Intuition

For X(k)X_{(k)} to have a value near xx: exactly k1k-1 of the nn samples must fall below xx (probability F(x)F(x) each), exactly nkn-k must fall above xx (probability 1F(x)1-F(x) each), and one sample must be at xx (density f(x)f(x)). The multinomial coefficient counts the number of ways to assign these roles.

Proof Sketch

The event {X(k)x}\{X_{(k)} \leq x\} is equivalent to "at least kk of the nn samples fall at or below xx." So FX(k)(x)=j=kn(nj)[F(x)]j[1F(x)]njF_{X_{(k)}}(x) = \sum_{j=k}^{n} \binom{n}{j} [F(x)]^j [1-F(x)]^{n-j}. Differentiating this CDF with respect to xx (using the telescoping property of adjacent binomial terms) yields the density formula.

Why It Matters

This formula is the starting point for deriving confidence intervals for population quantiles, the distribution of the sample range X(n)X(1)X_{(n)} - X_{(1)}, and the asymptotic distribution of sample quantiles. For the uniform distribution on [0,1][0,1], it simplifies to a Beta distribution: X(k)Beta(k,nk+1)X_{(k)} \sim \text{Beta}(k, n-k+1).

Failure Mode

The formula requires a continuous parent distribution. For discrete distributions, ties occur with positive probability and the density formula does not apply directly. A separate treatment using probability mass functions is needed.

Special Cases

Minimum and Maximum

The CDF of the maximum X(n)X_{(n)} is FX(n)(x)=[F(x)]nF_{X_{(n)}}(x) = [F(x)]^n: all nn samples must fall below xx. The CDF of the minimum X(1)X_{(1)} is FX(1)(x)=1[1F(x)]nF_{X_{(1)}}(x) = 1 - [1 - F(x)]^n: at least one sample must fall below xx.

For X1,,XnUniform(0,1)X_1, \ldots, X_n \sim \text{Uniform}(0,1): E[X(n)]=n/(n+1)\mathbb{E}[X_{(n)}] = n/(n+1) and E[X(1)]=1/(n+1)\mathbb{E}[X_{(1)}] = 1/(n+1).

Uniform Order Statistics

If X1,,XnUniform(0,1)X_1, \ldots, X_n \sim \text{Uniform}(0,1), then X(k)Beta(k,nk+1)X_{(k)} \sim \text{Beta}(k, n-k+1). This gives:

E[X(k)]=kn+1,Var(X(k))=k(nk+1)(n+1)2(n+2)\mathbb{E}[X_{(k)}] = \frac{k}{n+1}, \quad \text{Var}(X_{(k)}) = \frac{k(n-k+1)}{(n+1)^2(n+2)}

The joint density of all nn uniform order statistics is fX(1),,X(n)(x1,,xn)=n!f_{X_{(1)}, \ldots, X_{(n)}}(x_1, \ldots, x_n) = n! on the simplex 0<x1<<xn<10 < x_1 < \cdots < x_n < 1.

Connection to Confidence Intervals for Quantiles

To construct a distribution-free confidence interval for the population pp-quantile Q(p)=F1(p)Q(p) = F^{-1}(p): the interval [X(r),X(s)][X_{(r)}, X_{(s)}] contains Q(p)Q(p) with probability:

P(X(r)Q(p)X(s))=j=rs1(nj)pj(1p)njP(X_{(r)} \leq Q(p) \leq X_{(s)}) = \sum_{j=r}^{s-1} \binom{n}{j} p^j (1-p)^{n-j}

This is exact and distribution-free: it holds for any continuous FF. No parametric assumptions needed.

Connection to Concentration: Maximum of Subgaussian RVs

Proposition

Maximum of Subgaussian Random Variables

Statement

If X1,,XnX_1, \ldots, X_n are independent sub-Gaussian random variables with parameter σ\sigma (meaning E[eλXi]eλ2σ2/2\mathbb{E}[e^{\lambda X_i}] \leq e^{\lambda^2 \sigma^2 / 2} for all λ\lambda), then:

E[max1inXi]σ2logn\mathbb{E}\left[\max_{1 \leq i \leq n} X_i\right] \leq \sigma \sqrt{2 \log n}

Intuition

The maximum grows as logn\sqrt{\log n}, not n\sqrt{n}. Light-tailed variables keep the maximum controlled. This is why union bounds over nn hypotheses cost only logn\log n in the exponent.

Proof Sketch

For any λ>0\lambda > 0: E[maxiXi]=(1/λ)E[logexp(λmaxiXi)](1/λ)logE[exp(λmaxiXi)]\mathbb{E}[\max_i X_i] = (1/\lambda)\mathbb{E}[\log \exp(\lambda \max_i X_i)] \leq (1/\lambda) \log \mathbb{E}[\exp(\lambda \max_i X_i)] by Jensen. Then E[exp(λmaxiXi)]iE[exp(λXi)]nexp(λ2σ2/2)\mathbb{E}[\exp(\lambda \max_i X_i)] \leq \sum_i \mathbb{E}[\exp(\lambda X_i)] \leq n \exp(\lambda^2 \sigma^2/2). So E[maxiXi](logn)/λ+λσ2/2\mathbb{E}[\max_i X_i] \leq (\log n)/\lambda + \lambda \sigma^2/2. Optimize over λ\lambda: set λ=2logn/σ\lambda = \sqrt{2 \log n}/\sigma.

Why It Matters

This bound controls the supremum of empirical processes. In learning theory, the "hypothesis" index plays the role of ii, and the empirical risk deviation plays the role of XiX_i. The logn\sqrt{\log n} rate explains why finite hypothesis classes need only O(logH/n)O(\log |\mathcal{H}|/n) samples for uniform convergence.

Failure Mode

For heavy-tailed distributions, the maximum can grow much faster than logn\sqrt{\log n}. For Pareto-distributed variables with tail index α<2\alpha < 2, the maximum grows polynomially in nn, not logarithmically. The sub-Gaussian assumption is critical.

Connection to Bootstrap

The bootstrap resamples X1,,XnX_1^*, \ldots, X_n^* with replacement from the empirical distribution. The order statistics of the bootstrap sample induce a distribution on sample quantiles. The bootstrap estimate of the sampling distribution of Q^(p)\hat{Q}(p) is consistent under mild regularity conditions (the population quantile function must be differentiable at pp). This provides confidence intervals for quantiles without assuming a parametric model.

Canonical Examples

Example

Confidence interval for the median with n = 20

With n=20n = 20 observations, we want a 95% confidence interval for the population median Q(0.5)Q(0.5). The interval [X(6),X(15)][X_{(6)}, X_{(15)}] has coverage probability j=614(20j)(0.5)200.9586\sum_{j=6}^{14} \binom{20}{j} (0.5)^{20} \approx 0.9586. This is valid for any continuous distribution. No normality assumption needed.

Example

Distribution of the sample maximum from Uniform(0,1)

For nn i.i.d. Uniform(0,1)(0,1) variables: FX(n)(x)=xnF_{X_{(n)}}(x) = x^n for x[0,1]x \in [0,1]. The density is fX(n)(x)=nxn1f_{X_{(n)}}(x) = nx^{n-1}. The expected maximum is n/(n+1)n/(n+1), which approaches 1 as nn \to \infty. The variance is n/((n+1)2(n+2))n/((n+1)^2(n+2)), which decreases as O(1/n2)O(1/n^2).

Common Confusions

Watch Out

Order statistics are not independent

Even though X1,,XnX_1, \ldots, X_n are independent, the order statistics X(1),,X(n)X_{(1)}, \ldots, X_{(n)} are not. Knowing X(1)=3X_{(1)} = 3 tells you X(2)3X_{(2)} \geq 3. The joint distribution has a constrained support: x(1)x(n)x_{(1)} \leq \cdots \leq x_{(n)}.

Watch Out

Sample quantiles are random variables, not parameters

The sample pp-quantile Q^(p)\hat{Q}(p) is a statistic computed from data. The population pp-quantile Q(p)=F1(p)Q(p) = F^{-1}(p) is a fixed parameter. The sample quantile estimates the population quantile and has its own sampling distribution.

Watch Out

The median is not always better than the mean

The sample median has breakdown point 1/21/2 (robust to outliers) while the mean has breakdown point 00. But for Gaussian data, the mean has smaller variance than the median by a factor of π/21.57\pi/2 \approx 1.57. Robustness comes at a cost in efficiency when the data is actually Gaussian.

Summary

  • X(k)X_{(k)} has density involving [F(x)]k1[1F(x)]nkf(x)[F(x)]^{k-1}[1-F(x)]^{n-k}f(x)
  • Uniform order statistics follow Beta distributions
  • Confidence intervals for quantiles are distribution-free
  • Maximum of nn sub-Gaussian variables grows as σ2logn\sigma\sqrt{2\log n}
  • Order statistics are dependent even when the original sample is i.i.d.

Exercises

ExerciseCore

Problem

Let X1,,X5X_1, \ldots, X_5 be i.i.d. Uniform(0,1)(0,1). Compute E[X(3)]\mathbb{E}[X_{(3)}] and Var(X(3))\text{Var}(X_{(3)}).

ExerciseAdvanced

Problem

Derive the asymptotic distribution of the sample median X(n/2)X_{(\lceil n/2 \rceil)} for a sample from a continuous distribution with density ff that is positive at the population median m=F1(1/2)m = F^{-1}(1/2).

References

Canonical:

  • David & Nagaraja, Order Statistics (2003), Chapters 2-4
  • Casella & Berger, Statistical Inference (2002), Chapter 5.4

Current:

  • Boucheron, Lugosi, Massart, Concentration Inequalities (2013), Chapter 3

  • van der Vaart, Asymptotic Statistics (1998), Chapter 21

  • Lehmann & Casella, Theory of Point Estimation (1998), Chapters 1-6

  • van der Vaart, Asymptotic Statistics (1998), Chapters 2-8

Next Topics

  • Extreme value theory: limiting distributions of X(n)X_{(n)} as nn \to \infty
  • Bootstrap: resampling-based inference using order statistics

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Builds on This

Next Topics