Bootstrap Methods

Sneiderman, Robby

Statistical Estimation

Bootstrap Methods

The nonparametric bootstrap: resample with replacement to approximate sampling distributions, construct confidence intervals, and quantify uncertainty without distributional assumptions.

CoreTier 1StableSupporting~60 min

Prerequisites

Common Probability Distributions Asymptotic Statistics Central Limit Theorem Cross Validation Theory

Start 8-question practice · 4 available Prereq Map

Learning position

Read this page in the graph.

statistical-estimation | layer 2 | tier 1. This page has 7 direct prerequisites and 3 published dependents.

Open Atlas Prerequisites Leads to

What next

Hypothesis Testing for ML

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Efron's bootstrap (1979) is one of the defining statistical ideas of the late 20th century. Before it, constructing confidence intervals or estimating standard errors required either closed-form formulas (available only for simple estimators) or asymptotic approximations (via the central limit theorem) that could be poor in finite samples.

The bootstrap gives you a general-purpose machine: have any estimator, want a confidence interval? Resample. Want a standard error? Resample. Want a bias correction? Resample. It works for medians, correlations, regression coefficients, eigenvalues. virtually any statistic you can compute.

Mental Model

You have one sample of size $n$ from an unknown distribution $F$ . You want to know how your estimator $\hat{\theta}$ would vary if you could draw many samples from $F$ . But you only have one sample. The bootstrap says: treat your sample as if it were the population, and resample from it. The variability of the resampled estimates approximates the true sampling variability.

This sounds like cheating. It is not. It works because the empirical distribution $\hat{F}_n$ converges to $F$ , so resampling from $\hat{F}_n$ approximates sampling from $F$ .

Formal Setup

Definition

Empirical Distribution Function $\hat{F}_{n}$

Given observations $X_1, \ldots, X_n$ drawn i.i.d. from an unknown distribution $F$ , the empirical distribution function is:

$\hat{F}_n(x) = \frac{1}{n} \sum_{i=1}^{n} \mathbf{1}(X_i \leq x)$

This places mass $1/n$ on each observed data point.

Definition

Bootstrap Sample $X_{1}^{*}, \dots, X_{n}^{*}$

A bootstrap sample is a sample of size $n$ drawn i.i.d. from $\hat{F}_n$ . Equivalently, draw $n$ observations with replacement from the original sample $\{X_1, \ldots, X_n\}$ .

Each bootstrap sample will typically contain some original observations multiple times and omit others entirely. On average, about $1 - (1-1/n)^n \approx 63.2\%$ of the original observations appear at least once.

Definition

Bootstrap Distribution

Let $\hat{\theta} = T(X_1, \ldots, X_n)$ be any statistic. The bootstrap distribution of $\hat{\theta}$ is the distribution of $\hat{\theta}^* = T(X_1^*, \ldots, X_n^*)$ induced by resampling. In practice, you approximate it by generating $B$ bootstrap samples and computing $\hat{\theta}_1^*, \ldots, \hat{\theta}_B^*$ .

The Plug-in Principle

The bootstrap is an instance of the plug-in principle: to estimate a functional $\theta(F)$ of the unknown distribution, compute $\theta(\hat{F}_n)$ . To estimate the sampling distribution of $\hat{\theta}$ , replace $F$ with $\hat{F}_n$ everywhere.

The true sampling variance of $\hat{\theta}$ is:

$\text{Var}_F(\hat{\theta}) = \text{Var}_F(T(X_1, \ldots, X_n))$

The bootstrap estimate replaces $F$ with $\hat{F}_n$ :

$\widehat{\text{Var}}_{\text{boot}} = \text{Var}_{\hat{F}_n}(T(X_1^*, \ldots, X_n^*))$

In practice, approximate this with:

$\widehat{\text{Var}}_{\text{boot}} \approx \frac{1}{B-1} \sum_{b=1}^{B} (\hat{\theta}_b^* - \bar{\theta}^*)^2$

Main Theorems

Theorem

Bootstrap Consistency

Statement

Under regularity conditions, the bootstrap distribution of $\sqrt{n}(\hat{\theta}^* - \hat{\theta})$ converges to the same limit as $\sqrt{n}(\hat{\theta} - \theta)$ . More precisely, if $\sqrt{n}(\hat{\theta} - \theta) \xrightarrow{d} N(0, \sigma^2)$ , then conditional on the data, with probability 1:

$\sup_t \left| P^*(\sqrt{n}(\hat{\theta}^* - \hat{\theta}) \leq t) - P(\sqrt{n}(\hat{\theta} - \theta) \leq t) \right| \xrightarrow{P} 0$

where $P^*$ denotes probability under bootstrap resampling.

Intuition

The bootstrap works because the empirical distribution converges to the true distribution (Glivenko-Cantelli, a consequence of uniform convergence), so resampling from $\hat{F}_n$ mimics sampling from $F$ . The bootstrap distribution of the centered, scaled statistic converges to the same Gaussian limit as the original statistic.

Proof Sketch

The key steps are:

Show that the bootstrap mean $\bar{X}^*$ satisfies a conditional CLT: $\sqrt{n}(\bar{X}^* - \bar{X}) \xrightarrow{d} N(0, \sigma^2)$ in probability.
Extend to smooth functionals via the functional delta method: if $\hat{\theta} = g(\bar{X})$ for differentiable $g$ , then the bootstrap distribution of $g(\bar{X}^*)$ inherits consistency.
The Glivenko-Cantelli theorem ensures $\hat{F}_n \to F$ uniformly, which drives the convergence of the bootstrap distribution.

Why It Matters

Bootstrap consistency justifies using the bootstrap distribution for inference. It means bootstrap confidence intervals have asymptotically correct coverage, and bootstrap standard errors are asymptotically correct. all without knowing the form of $F$ or deriving analytic formulas.

Failure Mode

The bootstrap fails when the statistic is not sufficiently "smooth" as a functional of $F$ . Classic failure: the bootstrap is inconsistent for the maximum $X_{(n)}$ of a uniform distribution, because the empirical distribution has atoms but the true distribution is continuous. More generally, non-differentiable functionals and heavy-tailed distributions can cause bootstrap failure.

report a correction →

Bootstrap Confidence Intervals

There are several ways to turn the bootstrap distribution into a confidence interval. They differ in accuracy and computational cost.

Percentile Method

The simplest approach. For a $100(1-\alpha)\%$ confidence interval:

$CI_{\text{percentile}} = [\hat{\theta}^*_{\alpha/2}, \hat{\theta}^*_{1-\alpha/2}]$

where $\hat{\theta}^*_q$ is the $q$ -th quantile of the bootstrap distribution.

This is intuitive but can be inaccurate when the bootstrap distribution is skewed or when $\hat{\theta}$ is biased.

Pivotal (Basic) Bootstrap

Uses the bootstrap to estimate the distribution of $\hat{\theta} - \theta$ :

$CI_{\text{pivotal}} = [2\hat{\theta} - \hat{\theta}^*_{1-\alpha/2}, \; 2\hat{\theta} - \hat{\theta}^*_{\alpha/2}]$

Note the reversal of quantiles. This corrects for bias and is more accurate than the percentile method for skewed distributions.

BCa (Bias-Corrected and Accelerated)

The gold standard for bootstrap confidence intervals. It adjusts for both bias and skewness:

$CI_{\text{BCa}} = [\hat{\theta}^*_{\alpha_1}, \hat{\theta}^*_{\alpha_2}]$

where $\alpha_1, \alpha_2$ are adjusted quantile levels that depend on a bias correction factor $\hat{z}_0$ and an acceleration factor $\hat{a}$ . The acceleration is typically estimated via the jackknife. BCa intervals have second-order accuracy: coverage error is $O(n^{-1})$ rather than the $O(n^{-1/2})$ of the percentile method.

Variants

Parametric Bootstrap

Instead of resampling from $\hat{F}_n$ , fit a parametric model $F_{\hat{\theta}}$ and resample from it. For example, if you assume data are normal, estimate $\hat{\mu}, \hat{\sigma}^2$ via maximum likelihood and generate bootstrap samples from $N(\hat{\mu}, \hat{\sigma}^2)$ .

Advantage: more efficient when the model is correct. Disadvantage: invalid when the model is wrong.

Wild Bootstrap

For regression with heteroscedastic errors, the standard bootstrap (resampling residuals) fails because it destroys the heteroscedasticity pattern. The wild bootstrap fixes residuals to their original positions and multiplies each by a random variable with mean 0 and variance 1. Common choices: Rademacher ( $\pm 1$ with equal probability) or the two-point distribution of Mammen (1993).

Block Bootstrap

For time series data, i.i.d. resampling destroys temporal dependence. The block bootstrap resamples blocks of consecutive observations. Variants include the moving block bootstrap (fixed block length), the stationary bootstrap (random block length with geometric distribution), and the circular block bootstrap.

Canonical Examples

Example

Bootstrap standard error of the median

The median has no simple formula for its standard error (unlike the mean, where $\text{SE} = s/\sqrt{n}$ ). The bootstrap handles it effortlessly:

From a sample $X_1, \ldots, X_n$ , compute the sample median $\hat{m}$ .
Generate $B = 10{,}000$ bootstrap samples, compute the median of each: $\hat{m}_1^*, \ldots, \hat{m}_B^*$ .
The bootstrap standard error is $\text{SD}(\hat{m}_1^*, \ldots, \hat{m}_B^*)$ .

For a sample of size $n = 50$ from a standard normal, the true standard error of the median is approximately $\sqrt{\pi/(2n)} \approx 0.177$ . The bootstrap estimate will be close.

Example

Bootstrap for correlation coefficients

Given paired data $(X_i, Y_i)$ for $i = 1, \ldots, n$ , the sample correlation $\hat{r}$ has a complicated sampling distribution (especially when the true $\rho$ is not zero). The bootstrap gives you the distribution for free: resample pairs $(X_i^*, Y_i^*)$ with replacement, compute $\hat{r}^*$ for each bootstrap sample, and use the resulting distribution for inference.

Common Confusions

Watch Out

The bootstrap does not create new information

A common misconception is that bootstrap "generates new data." It does not. It uses the existing data to approximate the sampling distribution. The quality of the approximation depends on how well the empirical distribution approximates the true distribution. With $n = 5$ observations, the bootstrap may be unreliable because $\hat{F}_n$ is a poor approximation to $F$ .

Watch Out

More bootstrap samples B does not fix small n

Increasing $B$ (the number of bootstrap replications) reduces Monte Carlo error. The error from approximating the bootstrap distribution with a finite simulation. But it does not reduce the fundamental statistical error from having a small original sample $n$ . Even with $B = 1{,}000{,}000$ bootstrap samples, if $n = 10$ , the bootstrap distribution is built from only 10 distinct values.

Watch Out

The bootstrap can fail

The bootstrap is not universally valid. It fails for:

Extreme order statistics (e.g., the sample maximum from a bounded distribution). The bootstrap distribution does not converge to the true sampling distribution.
Heavy-tailed distributions without finite variance: the CLT does not apply, so the bootstrap CLT also fails.
Non-smooth functionals: if $\theta(F)$ is not a smooth functional of $F$ , the plug-in principle can break.

Summary

The bootstrap approximates the sampling distribution by resampling with replacement from the observed data
It works because $\hat{F}_n \to F$ (Glivenko-Cantelli), so resampling from $\hat{F}_n$ mimics sampling from $F$
Bootstrap confidence intervals: percentile (simplest), pivotal (better for skewed distributions), BCa (gold standard, second-order accurate)
Parametric bootstrap: resample from fitted model (more efficient if model is correct)
Wild bootstrap: for heteroscedastic regression; block bootstrap: for time series
Bootstrap fails for non-smooth functionals, extreme order statistics, and heavy-tailed distributions

Exercises

ExerciseCore

Problem

You have a sample of $n = 30$ observations: $X_1, \ldots, X_{30}$ . You want a 95% bootstrap confidence interval for the population median using the percentile method.

Describe the algorithm step by step. Then: if your $B = 1000$ bootstrap medians are sorted as $\hat{m}^*_{(1)} \leq \cdots \leq \hat{m}^*_{(1000)}$ , which order statistics give you the interval endpoints?

ExerciseAdvanced

Problem

Explain why the nonparametric bootstrap is inconsistent for the sample maximum $X_{(n)}$ when sampling from a $\text{Uniform}(0, \theta)$ distribution. What is the correct rate of convergence for $X_{(n)}$ , and why does the bootstrap get it wrong?

References

Canonical:

Efron, B. "Bootstrap Methods: Another Look at the Jackknife" (1979)
Efron, B. & Tibshirani, R. An Introduction to the Bootstrap (1993)
Davison, A.C. & Hinkley, D.V. Bootstrap Methods and their Application (1997)
Hall, P. The Bootstrap and Edgeworth Expansion (1992)

Theory background:

van der Vaart, A.W. Asymptotic Statistics (1998), Chapters 2-3
van der Vaart & Wellner, Weak Convergence and Empirical Processes (1996), Chapter 2
Billingsley, Probability and Measure (1995), Chapter 5

Next Topics

The natural next steps from bootstrap methods:

Bootstrap theory: when and why the bootstrap is consistent, Edgeworth expansions and higher-order accuracy
Hypothesis testing with the bootstrap: permutation tests, bootstrap p-values, and the connection to resampling-based inference

Last reviewed: April 13, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

7

Common Probability Distributionslayer 0A · tier 1
Asymptotic Statistics: M-Estimators, Delta Method, LANlayer 0B · tier 1
Central Limit Theoremlayer 0B · tier 1
Order Statisticslayer 1 · tier 2
Cross-Validation Theorylayer 2 · tier 2

Derived topics

3

Bagginglayer 2 · tier 1
Permutation Testslayer 2 · tier 1
Random Forestslayer 2 · tier 1

Graph-backed continuations

Bagging Random Forests