Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Bootstrap Resampling

Bootstrap Methods

The nonparametric bootstrap: resample with replacement to approximate sampling distributions, construct confidence intervals, and quantify uncertainty without distributional assumptions.

CoreTier 1Stable~60 min

Why This Matters

Efron's bootstrap (1979) is one of the defining statistical ideas of the late 20th century. Before it, constructing confidence intervals or estimating standard errors required either closed-form formulas (available only for simple estimators) or asymptotic approximations that could be poor in finite samples.

The bootstrap gives you a general-purpose machine: have any estimator, want a confidence interval? Resample. Want a standard error? Resample. Want a bias correction? Resample. It works for medians, correlations, regression coefficients, eigenvalues. virtually any statistic you can compute.

Original37295184610x̄ = 5.5B137795114610x̄* = 5.3B22995188463x̄* = 5.5B377210518446x̄* = 5.4Original valuesDuplicated by resamplingRepeat B times, get distribution of x̄*

Mental Model

You have one sample of size nn from an unknown distribution FF. You want to know how your estimator θ^\hat{\theta} would vary if you could draw many samples from FF. But you only have one sample. The bootstrap says: treat your sample as if it were the population, and resample from it. The variability of the resampled estimates approximates the true sampling variability.

This sounds like cheating. It is not. It works because the empirical distribution F^n\hat{F}_n converges to FF, so resampling from F^n\hat{F}_n approximates sampling from FF.

Formal Setup

Definition

Empirical Distribution Function

Given observations X1,,XnX_1, \ldots, X_n drawn i.i.d. from an unknown distribution FF, the empirical distribution function is:

F^n(x)=1ni=1n1(Xix)\hat{F}_n(x) = \frac{1}{n} \sum_{i=1}^{n} \mathbf{1}(X_i \leq x)

This places mass 1/n1/n on each observed data point.

Definition

Bootstrap Sample

A bootstrap sample is a sample of size nn drawn i.i.d. from F^n\hat{F}_n. Equivalently, draw nn observations with replacement from the original sample {X1,,Xn}\{X_1, \ldots, X_n\}.

Each bootstrap sample will typically contain some original observations multiple times and omit others entirely. On average, about 1(11/n)n63.2%1 - (1-1/n)^n \approx 63.2\% of the original observations appear at least once.

Definition

Bootstrap Distribution

Let θ^=T(X1,,Xn)\hat{\theta} = T(X_1, \ldots, X_n) be any statistic. The bootstrap distribution of θ^\hat{\theta} is the distribution of θ^=T(X1,,Xn)\hat{\theta}^* = T(X_1^*, \ldots, X_n^*) induced by resampling. In practice, you approximate it by generating BB bootstrap samples and computing θ^1,,θ^B\hat{\theta}_1^*, \ldots, \hat{\theta}_B^*.

The Plug-in Principle

The bootstrap is an instance of the plug-in principle: to estimate a functional θ(F)\theta(F) of the unknown distribution, compute θ(F^n)\theta(\hat{F}_n). To estimate the sampling distribution of θ^\hat{\theta}, replace FF with F^n\hat{F}_n everywhere.

The true sampling variance of θ^\hat{\theta} is:

VarF(θ^)=VarF(T(X1,,Xn))\text{Var}_F(\hat{\theta}) = \text{Var}_F(T(X_1, \ldots, X_n))

The bootstrap estimate replaces FF with F^n\hat{F}_n:

Var^boot=VarF^n(T(X1,,Xn))\widehat{\text{Var}}_{\text{boot}} = \text{Var}_{\hat{F}_n}(T(X_1^*, \ldots, X_n^*))

In practice, approximate this with:

Var^boot1B1b=1B(θ^bθˉ)2\widehat{\text{Var}}_{\text{boot}} \approx \frac{1}{B-1} \sum_{b=1}^{B} (\hat{\theta}_b^* - \bar{\theta}^*)^2

Main Theorems

Theorem

Bootstrap Consistency

Statement

Under regularity conditions, the bootstrap distribution of n(θ^θ^)\sqrt{n}(\hat{\theta}^* - \hat{\theta}) converges to the same limit as n(θ^θ)\sqrt{n}(\hat{\theta} - \theta). More precisely, if n(θ^θ)dN(0,σ2)\sqrt{n}(\hat{\theta} - \theta) \xrightarrow{d} N(0, \sigma^2), then conditional on the data, with probability 1:

suptP(n(θ^θ^)t)P(n(θ^θ)t)P0\sup_t \left| P^*(\sqrt{n}(\hat{\theta}^* - \hat{\theta}) \leq t) - P(\sqrt{n}(\hat{\theta} - \theta) \leq t) \right| \xrightarrow{P} 0

where PP^* denotes probability under bootstrap resampling.

Intuition

The bootstrap works because the empirical distribution converges to the true distribution (Glivenko-Cantelli), so resampling from F^n\hat{F}_n mimics sampling from FF. The bootstrap distribution of the centered, scaled statistic converges to the same Gaussian limit as the original statistic.

Proof Sketch

The key steps are:

  1. Show that the bootstrap mean Xˉ\bar{X}^* satisfies a conditional CLT: n(XˉXˉ)dN(0,σ2)\sqrt{n}(\bar{X}^* - \bar{X}) \xrightarrow{d} N(0, \sigma^2) in probability.
  2. Extend to smooth functionals via the functional delta method: if θ^=g(Xˉ)\hat{\theta} = g(\bar{X}) for differentiable gg, then the bootstrap distribution of g(Xˉ)g(\bar{X}^*) inherits consistency.
  3. The Glivenko-Cantelli theorem ensures F^nF\hat{F}_n \to F uniformly, which drives the convergence of the bootstrap distribution.

Why It Matters

Bootstrap consistency justifies using the bootstrap distribution for inference. It means bootstrap confidence intervals have asymptotically correct coverage, and bootstrap standard errors are asymptotically correct. all without knowing the form of FF or deriving analytic formulas.

Failure Mode

The bootstrap fails when the statistic is not sufficiently "smooth" as a functional of FF. Classic failure: the bootstrap is inconsistent for the maximum X(n)X_{(n)} of a uniform distribution, because the empirical distribution has atoms but the true distribution is continuous. More generally, non-differentiable functionals and heavy-tailed distributions can cause bootstrap failure.

Bootstrap Confidence Intervals

There are several ways to turn the bootstrap distribution into a confidence interval. They differ in accuracy and computational cost.

Percentile Method

The simplest approach. For a 100(1α)%100(1-\alpha)\% confidence interval:

CIpercentile=[θ^α/2,θ^1α/2]CI_{\text{percentile}} = [\hat{\theta}^*_{\alpha/2}, \hat{\theta}^*_{1-\alpha/2}]

where θ^q\hat{\theta}^*_q is the qq-th quantile of the bootstrap distribution.

This is intuitive but can be inaccurate when the bootstrap distribution is skewed or when θ^\hat{\theta} is biased.

Pivotal (Basic) Bootstrap

Uses the bootstrap to estimate the distribution of θ^θ\hat{\theta} - \theta:

CIpivotal=[2θ^θ^1α/2,  2θ^θ^α/2]CI_{\text{pivotal}} = [2\hat{\theta} - \hat{\theta}^*_{1-\alpha/2}, \; 2\hat{\theta} - \hat{\theta}^*_{\alpha/2}]

Note the reversal of quantiles. This corrects for bias and is more accurate than the percentile method for skewed distributions.

BCa (Bias-Corrected and Accelerated)

The gold standard for bootstrap confidence intervals. It adjusts for both bias and skewness:

CIBCa=[θ^α1,θ^α2]CI_{\text{BCa}} = [\hat{\theta}^*_{\alpha_1}, \hat{\theta}^*_{\alpha_2}]

where α1,α2\alpha_1, \alpha_2 are adjusted quantile levels that depend on a bias correction factor z^0\hat{z}_0 and an acceleration factor a^\hat{a}. The acceleration is typically estimated via the jackknife. BCa intervals have second-order accuracy: coverage error is O(n1)O(n^{-1}) rather than the O(n1/2)O(n^{-1/2}) of the percentile method.

Variants

Parametric Bootstrap

Instead of resampling from F^n\hat{F}_n, fit a parametric model Fθ^F_{\hat{\theta}} and resample from it. For example, if you assume data are normal, estimate μ^,σ^2\hat{\mu}, \hat{\sigma}^2 and generate bootstrap samples from N(μ^,σ^2)N(\hat{\mu}, \hat{\sigma}^2).

Advantage: more efficient when the model is correct. Disadvantage: invalid when the model is wrong.

Wild Bootstrap

For regression with heteroscedastic errors, the standard bootstrap (resampling residuals) fails because it destroys the heteroscedasticity pattern. The wild bootstrap fixes residuals to their original positions and multiplies each by a random variable with mean 0 and variance 1. Common choices: Rademacher (±1\pm 1 with equal probability) or the two-point distribution of Mammen (1993).

Block Bootstrap

For time series data, i.i.d. resampling destroys temporal dependence. The block bootstrap resamples blocks of consecutive observations. Variants include the moving block bootstrap (fixed block length), the stationary bootstrap (random block length with geometric distribution), and the circular block bootstrap.

Canonical Examples

Example

Bootstrap standard error of the median

The median has no simple formula for its standard error (unlike the mean, where SE=s/n\text{SE} = s/\sqrt{n}). The bootstrap handles it effortlessly:

  1. From a sample X1,,XnX_1, \ldots, X_n, compute the sample median m^\hat{m}.
  2. Generate B=10,000B = 10{,}000 bootstrap samples, compute the median of each: m^1,,m^B\hat{m}_1^*, \ldots, \hat{m}_B^*.
  3. The bootstrap standard error is SD(m^1,,m^B)\text{SD}(\hat{m}_1^*, \ldots, \hat{m}_B^*).

For a sample of size n=50n = 50 from a standard normal, the true standard error of the median is approximately π/(2n)0.177\sqrt{\pi/(2n)} \approx 0.177. The bootstrap estimate will be close.

Example

Bootstrap for correlation coefficients

Given paired data (Xi,Yi)(X_i, Y_i) for i=1,,ni = 1, \ldots, n, the sample correlation r^\hat{r} has a complicated sampling distribution (especially when the true ρ\rho is not zero). The bootstrap gives you the distribution for free: resample pairs (Xi,Yi)(X_i^*, Y_i^*) with replacement, compute r^\hat{r}^* for each bootstrap sample, and use the resulting distribution for inference.

Common Confusions

Watch Out

The bootstrap does not create new information

A common misconception is that bootstrap "generates new data." It does not. It uses the existing data to approximate the sampling distribution. The quality of the approximation depends on how well the empirical distribution approximates the true distribution. With n=5n = 5 observations, the bootstrap may be unreliable because F^n\hat{F}_n is a poor approximation to FF.

Watch Out

More bootstrap samples B does not fix small n

Increasing BB (the number of bootstrap replications) reduces Monte Carlo error. The error from approximating the bootstrap distribution with a finite simulation. But it does not reduce the fundamental statistical error from having a small original sample nn. Even with B=1,000,000B = 1{,}000{,}000 bootstrap samples, if n=10n = 10, the bootstrap distribution is built from only 10 distinct values.

Watch Out

The bootstrap can fail

The bootstrap is not universally valid. It fails for:

  • Extreme order statistics (e.g., the sample maximum from a bounded distribution). The bootstrap distribution does not converge to the true sampling distribution.
  • Heavy-tailed distributions without finite variance: the CLT does not apply, so the bootstrap CLT also fails.
  • Non-smooth functionals: if θ(F)\theta(F) is not a smooth functional of FF, the plug-in principle can break.

Summary

  • The bootstrap approximates the sampling distribution by resampling with replacement from the observed data
  • It works because F^nF\hat{F}_n \to F (Glivenko-Cantelli), so resampling from F^n\hat{F}_n mimics sampling from FF
  • Bootstrap confidence intervals: percentile (simplest), pivotal (better for skewed distributions), BCa (gold standard, second-order accurate)
  • Parametric bootstrap: resample from fitted model (more efficient if model is correct)
  • Wild bootstrap: for heteroscedastic regression; block bootstrap: for time series
  • Bootstrap fails for non-smooth functionals, extreme order statistics, and heavy-tailed distributions

Exercises

ExerciseCore

Problem

You have a sample of n=30n = 30 observations: X1,,X30X_1, \ldots, X_{30}. You want a 95% bootstrap confidence interval for the population median using the percentile method.

Describe the algorithm step by step. Then: if your B=1000B = 1000 bootstrap medians are sorted as m^(1)m^(1000)\hat{m}^*_{(1)} \leq \cdots \leq \hat{m}^*_{(1000)}, which order statistics give you the interval endpoints?

ExerciseAdvanced

Problem

Explain why the nonparametric bootstrap is inconsistent for the sample maximum X(n)X_{(n)} when sampling from a Uniform(0,θ)\text{Uniform}(0, \theta) distribution. What is the correct rate of convergence for X(n)X_{(n)}, and why does the bootstrap get it wrong?

References

Canonical:

  • Efron, B. "Bootstrap Methods: Another Look at the Jackknife" (1979)
  • Efron, B. & Tibshirani, R. An Introduction to the Bootstrap (1993)

Current:

  • Davison, A.C. & Hinkley, D.V. Bootstrap Methods and their Application (1997)
  • Hall, P. The Bootstrap and Edgeworth Expansion (1992)

Next Topics

The natural next steps from bootstrap methods:

  • Bootstrap theory: when and why the bootstrap is consistent, Edgeworth expansions and higher-order accuracy
  • Hypothesis testing with the bootstrap: permutation tests, bootstrap p-values, and the connection to resampling-based inference

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Builds on This

Next Topics