Order Statistics

Sneiderman, Robby

Statistical Foundations

Order Statistics

Order statistics are the sorted values of a random sample. Their distributions govern quantile estimation, confidence intervals for medians, and the behavior of extremes.

CoreTier 2StableSupporting~45 min

Prerequisites

Common Probability Distributions Triangular Distribution

Start 8-question practice · 5 available 3-question pulse check Prereq Map

Learning position

Read this page in the graph.

statistical-foundations | layer 1 | tier 2. This page has 2 direct prerequisites and 5 published dependents.

Open Atlas Prerequisites Leads to

What next

Bootstrap Methods

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Order statistic densities from Uniform(0,1): X_(k)~ Beta(k, n−k+1)

faint curves: all n densities · highlighted: selected k · amber tick: E[X_(k)]

n = 20

k = 10

mean = 0.4762

std = 0.1065

n (size)20k (index)10

slide k from 1 (min) to n (max) to see the fan sweep across [0,1]. variance is largest at k = (n+1)/2 and smallest at the extremes.

Order statistics appear whenever you sort data. The sample median, percentiles, confidence intervals for quantiles, and extreme values are all functions of order statistics. In ML, the maximum of subgaussian random variables controls uniform convergence bounds. Understanding order statistic distributions is required for nonparametric inference and bootstrap theory.

Mental Model

Given $n$ i.i.d. random variables $X_1, \ldots, X_n$ , sort them to get $X_{(1)} \leq X_{(2)} \leq \cdots \leq X_{(n)}$ . The value $X_{(k)}$ is the $k$ -th order statistic. The minimum is $X_{(1)}$ , the maximum is $X_{(n)}$ , and the sample median is $X_{(\lceil n/2 \rceil)}$ . Each order statistic has its own distribution, derived from the parent distribution.

Core Definitions

Definition

Order Statistic $X_{(k)}$

Given a random sample $X_1, \ldots, X_n$ from a continuous distribution $F$ , the $k$ -th order statistic $X_{(k)}$ is the $k$ -th smallest value in the sample. Formally, $X_{(1)} \leq X_{(2)} \leq \cdots \leq X_{(n)}$ is the sorted rearrangement of $X_1, \ldots, X_n$ .

Definition

Sample Quantile $\hat{Q} (p)$

The sample $p$ -quantile is $X_{(\lceil np \rceil)}$ , the order statistic at position $\lceil np \rceil$ . The sample median is $\hat{Q}(0.5)$ . Sample quartiles are $\hat{Q}(0.25)$ and $\hat{Q}(0.75)$ . The interquartile range (IQR) is $\hat{Q}(0.75) - \hat{Q}(0.25)$ .

Main Theorems

Theorem

Density of the k-th Order Statistic

Statement

The probability density function of the $k$ -th order statistic $X_{(k)}$ from a sample of size $n$ drawn i.i.d. from a continuous distribution with pdf $f$ and CDF $F$ is:

$f_{X_{(k)}}(x) = \frac{n!}{(k-1)!(n-k)!} [F(x)]^{k-1} [1 - F(x)]^{n-k} f(x)$

Intuition

For $X_{(k)}$ to have a value near $x$ : exactly $k-1$ of the $n$ samples must fall below $x$ (probability $F(x)$ each), exactly $n-k$ must fall above $x$ (probability $1-F(x)$ each), and one sample must be at $x$ (density $f(x)$ ). The multinomial coefficient counts the number of ways to assign these roles.

Proof Sketch

The event $\{X_{(k)} \leq x\}$ is equivalent to "at least $k$ of the $n$ samples fall at or below $x$ ." So $F_{X_{(k)}}(x) = \sum_{j=k}^{n} \binom{n}{j} [F(x)]^j [1-F(x)]^{n-j}$ . Differentiating this CDF with respect to $x$ (using the telescoping property of adjacent binomial terms) yields the density formula.

Why It Matters

This formula is the starting point for deriving confidence intervals for population quantiles, the distribution of the sample range $X_{(n)} - X_{(1)}$ , and the asymptotic distribution of sample quantiles. For the uniform distribution on $[0,1]$ , it simplifies to a Beta distribution: $X_{(k)} \sim \text{Beta}(k, n-k+1)$ .

Failure Mode

The formula requires a continuous parent distribution. For discrete distributions, ties occur with positive probability and the density formula does not apply directly. A separate treatment using probability mass functions is needed.

report a correction →

Special Cases

Minimum and Maximum

The CDF of the maximum $X_{(n)}$ is $F_{X_{(n)}}(x) = [F(x)]^n$ : all $n$ samples must fall below $x$ . The CDF of the minimum $X_{(1)}$ is $F_{X_{(1)}}(x) = 1 - [1 - F(x)]^n$ : at least one sample must fall below $x$ .

For $X_1, \ldots, X_n \sim \text{Uniform}(0,1)$ : $\mathbb{E}[X_{(n)}] = n/(n+1)$ and $\mathbb{E}[X_{(1)}] = 1/(n+1)$ .

Uniform Order Statistics

If $X_1, \ldots, X_n \sim \text{Uniform}(0,1)$ , then $X_{(k)} \sim \text{Beta}(k, n-k+1)$ . This gives:

$\mathbb{E}[X_{(k)}] = \frac{k}{n+1}, \quad \text{Var}(X_{(k)}) = \frac{k(n-k+1)}{(n+1)^2(n+2)}$

The joint density of all $n$ uniform order statistics is $f_{X_{(1)}, \ldots, X_{(n)}}(x_1, \ldots, x_n) = n!$ on the simplex $0 < x_1 < \cdots < x_n < 1$ .

Connection to Confidence Intervals for Quantiles

To construct a distribution-free confidence interval for the population $p$ -quantile $Q(p) = F^{-1}(p)$ : the interval $[X_{(r)}, X_{(s)}]$ contains $Q(p)$ with probability:

$P(X_{(r)} \leq Q(p) \leq X_{(s)}) = \sum_{j=r}^{s-1} \binom{n}{j} p^j (1-p)^{n-j}$

This is exact and distribution-free: it holds for any continuous $F$ . No parametric assumptions needed.

Connection to Concentration: Maximum of Subgaussian RVs

Proposition

Maximum of Subgaussian Random Variables

Statement

If $X_1, \ldots, X_n$ are independent sub-Gaussian random variables with parameter $\sigma$ (meaning $\mathbb{E}[e^{\lambda X_i}] \leq e^{\lambda^2 \sigma^2 / 2}$ for all $\lambda$ ), then:

$\mathbb{E}\left[\max_{1 \leq i \leq n} X_i\right] \leq \sigma \sqrt{2 \log n}$

Intuition

The maximum grows as $\sqrt{\log n}$ , not $\sqrt{n}$ . Light-tailed variables keep the maximum controlled. This is why union bounds over $n$ hypotheses cost only $\log n$ in the exponent.

Proof Sketch

For any $\lambda > 0$ : $\mathbb{E}[\max_i X_i] = (1/\lambda)\mathbb{E}[\log \exp(\lambda \max_i X_i)] \leq (1/\lambda) \log \mathbb{E}[\exp(\lambda \max_i X_i)]$ by Jensen. Then $\mathbb{E}[\exp(\lambda \max_i X_i)] \leq \sum_i \mathbb{E}[\exp(\lambda X_i)] \leq n \exp(\lambda^2 \sigma^2/2)$ . So $\mathbb{E}[\max_i X_i] \leq (\log n)/\lambda + \lambda \sigma^2/2$ . Optimize over $\lambda$ : set $\lambda = \sqrt{2 \log n}/\sigma$ .

Why It Matters

This bound controls the supremum of empirical processes. In learning theory, the "hypothesis" index plays the role of $i$ , and the empirical risk deviation plays the role of $X_i$ . The $\sqrt{\log n}$ rate gives an agnostic uniform-convergence deviation of order $O\!\left(\sqrt{\log |\mathcal{H}|/n}\right)$ for bounded losses, hence sample complexity $O(\log|\mathcal{H}|/\varepsilon^2)$ to drive the deviation below $\varepsilon$ . The faster $O(\log|\mathcal{H}|/\varepsilon)$ rate is a separate result available only in the realizable PAC setting with a consistent learner.

Failure Mode

For heavy-tailed distributions, the maximum can grow much faster than $\sqrt{\log n}$ . For Pareto-distributed variables with tail index $\alpha < 2$ , the maximum grows polynomially in $n$ , not logarithmically. The sub-Gaussian assumption is critical.

report a correction →

Connection to Bootstrap

The bootstrap resamples $X_1^*, \ldots, X_n^*$ with replacement from the empirical distribution. The order statistics of the bootstrap sample induce a distribution on sample quantiles. The bootstrap estimate of the sampling distribution of $\hat{Q}(p)$ is consistent under mild regularity conditions (the population quantile function must be differentiable at $p$ ). This provides confidence intervals for quantiles without assuming a parametric model.

Canonical Examples

Example

Confidence interval for the median with n = 20

With $n = 20$ observations, we want a 95% confidence interval for the population median $Q(0.5)$ . The interval $[X_{(6)}, X_{(15)}]$ has coverage probability $\sum_{j=6}^{14} \binom{20}{j} (0.5)^{20} \approx 0.9586$ . This is valid for any continuous distribution. No normality assumption needed.

Example

Distribution of the sample maximum from Uniform(0,1)

For $n$ i.i.d. Uniform $(0,1)$ variables: $F_{X_{(n)}}(x) = x^n$ for $x \in [0,1]$ . The density is $f_{X_{(n)}}(x) = nx^{n-1}$ . The expected maximum is $n/(n+1)$ , which approaches 1 as $n \to \infty$ . The variance is $n/((n+1)^2(n+2))$ , which decreases as $O(1/n^2)$ .

Common Confusions

Watch Out

Order statistics are not independent

Even though $X_1, \ldots, X_n$ are independent, the order statistics $X_{(1)}, \ldots, X_{(n)}$ are not. Knowing $X_{(1)} = 3$ tells you $X_{(2)} \geq 3$ . The joint distribution has a constrained support: $x_{(1)} \leq \cdots \leq x_{(n)}$ .

Watch Out

Sample quantiles are random variables, not parameters

The sample $p$ -quantile $\hat{Q}(p)$ is a statistic computed from data. The population $p$ -quantile $Q(p) = F^{-1}(p)$ is a fixed parameter. The sample quantile estimates the population quantile and has its own sampling distribution.

Watch Out

The median is not always better than the mean

The sample median has breakdown point $1/2$ (robust to outliers) while the mean has breakdown point $0$ . But for Gaussian data, the mean has smaller variance than the median by a factor of $\pi/2 \approx 1.57$ . Robustness comes at a cost in efficiency when the data is actually Gaussian.

Summary

$X_{(k)}$ has density involving $[F(x)]^{k-1}[1-F(x)]^{n-k}f(x)$
Uniform order statistics follow Beta distributions
Confidence intervals for quantiles are distribution-free
Maximum of $n$ sub-Gaussian variables grows as $\sigma\sqrt{2\log n}$
Order statistics are dependent even when the original sample is i.i.d.

Exercises

ExerciseCore

Problem

Let $X_1, \ldots, X_5$ be i.i.d. Uniform $(0,1)$ . Compute $\mathbb{E}[X_{(3)}]$ and $\text{Var}(X_{(3)})$ .

ExerciseAdvanced

Problem

Derive the asymptotic distribution of the sample median $X_{(\lceil n/2 \rceil)}$ for a sample from a continuous distribution with density $f$ that is positive at the population median $m = F^{-1}(1/2)$ .

References

Canonical:

David & Nagaraja, Order Statistics (2003), Chapters 2-4
Casella & Berger, Statistical Inference (2002), Chapter 5.4

Current:

Boucheron, Lugosi, Massart, Concentration Inequalities (2013), Chapter 3
van der Vaart, Asymptotic Statistics (1998), Chapter 21
Arnold, Balakrishnan, Nagaraja, A First Course in Order Statistics (2008), Chapters 2-4

Next Topics

Extreme value theory: limiting distributions of $X_{(n)}$ as $n \to \infty$
Bootstrap: resampling-based inference using order statistics

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

2

Common Probability Distributionslayer 0A · tier 1
Triangular Distributionlayer 0A · tier 2

Derived topics

5

Bootstrap Methodslayer 2 · tier 1
Split Conformal Predictionlayer 2 · tier 1
K-Nearest Neighborslayer 1 · tier 2
Extreme Value Theorylayer 3 · tier 2
Winsorizationlayer 1 · tier 3

Graph-backed continuations

Bootstrap Methods Extreme Value Theory K-Nearest Neighbors Split Conformal Prediction Winsorization