Statistical Estimation
Asymptotic Statistics
The large-sample toolbox: delta method, Slutsky's theorem, asymptotic normality of MLE, local asymptotic normality, and Fisher efficiency. These results justify nearly every confidence interval and hypothesis test used in practice.
Prerequisites
Why This Matters
Almost every confidence interval, standard error, and -value computed relies on asymptotic theory. When you report , you are implicitly invoking the asymptotic normality of your estimator. When you use the likelihood ratio test, you are relying on its asymptotic distribution.
Asymptotic statistics is not just about letting . It provides finite-sample approximations that are often remarkably accurate, and it tells you the fundamental limits of estimation via Fisher information and efficiency.
Mental Model
The central limit theorem says that sample means are approximately normal for large . Asymptotic statistics extends this in two directions. First, the delta method lets you transfer normality through smooth transformations: if is approximately normal, then is too, with a variance you can compute. Second, maximum likelihood estimators inherit CLT-like normality under regularity conditions, achieving the best possible variance (the Cramér-Rao bound) in the limit.
Formal Setup and Notation
We write for convergence in distribution, for convergence in probability, and to mean is bounded in probability.
Main Theorems
Delta Method
Statement
If and is differentiable at with , then:
Intuition
A smooth function of an approximately normal random variable is itself approximately normal. The variance gets multiplied by the squared derivative because locally looks like a linear function with slope .
Proof Sketch
Taylor expand . Then . The remainder term is because and is locally bounded. Apply Slutsky's theorem to conclude.
Why It Matters
The delta method is the workhorse of applied statistics. Need the variance of ? Of ? Of ? The delta method gives you the answer immediately without simulation. It is also the basis for constructing confidence intervals on transformed parameters.
Failure Mode
When , the first-order delta method gives a degenerate limit. You need the second-order delta method: if and , then . The rate changes from to and the limit is no longer normal.
Slutsky's Theorem
Statement
If and (a constant), then:
Intuition
A sequence converging in probability to a constant behaves like a constant in the limit. You can add, multiply, or divide by it without disrupting distributional convergence. The key is that must converge to a constant, not a random variable.
Proof Sketch
Write on a product space. Since , the pair by a standard coupling argument. Continuous mapping theorem then gives for continuous .
Why It Matters
Slutsky's theorem is the glue that holds asymptotic arguments together. Every time you replace with in a -statistic and claim the limit is still standard normal, you are using Slutsky.
Failure Mode
Slutsky fails if converges in distribution to a non-degenerate random variable. In that case, you need joint convergence and the continuous mapping theorem.
Asymptotic Normality of MLE
Statement
Under regularity conditions, the maximum likelihood estimator satisfies:
where is the Fisher information matrix.
Intuition
The MLE is approximately normal centered at the truth, with variance equal to the inverse Fisher information divided by . This is the best variance any regular estimator can achieve (Cramér-Rao bound), so the MLE is asymptotically efficient.
Proof Sketch
Expand the score equation around : . By the CLT, . By the LLN, . Combine via Slutsky to get the result.
Why It Matters
This theorem justifies the standard practice of reporting MLE point estimates with standard errors computed from the observed Fisher information. It is the theoretical backbone of likelihood-based inference.
Failure Mode
Regularity conditions fail at boundary parameters (e.g., variance = 0), non-identifiable models (e.g., mixture models with unknown number of components), and models where the support depends on the parameter (e.g., ). In these cases the MLE may have non-normal limits or converge at non- rates.
Local Asymptotic Normality
Statement
Under regularity conditions, the log-likelihood ratio admits the expansion:
where .
Intuition
At the local scale around the truth, the statistical experiment looks like a Gaussian shift experiment. The sufficient statistic is (the normalized score), and the Fisher information determines the signal strength. This is the deepest structural result in parametric statistics.
Proof Sketch
Taylor expand the log-likelihood ratio to second order. The first-order term gives and the second-order term gives after applying the law of large numbers to the Hessian. Higher-order terms vanish in probability.
Why It Matters
LAN shows that, asymptotically, all regular parametric problems reduce to Gaussian location problems. This unifies the theory of efficient estimation and optimal testing. It also implies the asymptotic minimax lower bound: no regular estimator can beat the Fisher information bound.
Failure Mode
LAN fails for non-regular models where the Fisher information is zero or infinite, and for semiparametric or nonparametric models without finite-dimensional sufficient statistics.
Core Definitions
Contiguity
Two sequences of probability measures and are contiguous if implies for every sequence of measurable sets . In the LAN framework, and are mutually contiguous. This means tests that are consistent under remain well-behaved under local alternatives.
Asymptotic Relative Efficiency
The asymptotic relative efficiency of estimator relative to is:
If , then is more efficient. The MLE achieves the maximum ARE of 1 relative to the Cramér-Rao bound.
Canonical Examples
Delta method: variance-stabilizing transform for Poisson
If , then . The variance depends on . Apply the delta method with : . The variance is now constant, independent of . This is the variance-stabilizing transformation.
MLE for exponential rate
If , the MLE is . Fisher information is . By asymptotic normality: . A 95% confidence interval is .
Common Confusions
Asymptotic normality does not mean the estimator is normal
The MLE is approximately normal for large . For small , the actual distribution can be heavily skewed. Bootstrap or exact methods may be needed for small samples. The asymptotic approximation is a tool, not a fact about the estimator's distribution.
Efficiency is an asymptotic concept
An estimator can be asymptotically efficient yet perform poorly in small samples. Conversely, a biased estimator like James-Stein can dominate the MLE in finite samples while being asymptotically equivalent.
Summary
- Delta method:
- Slutsky: convergence in probability to a constant acts like a constant
- MLE is asymptotically normal with variance
- LAN: locally, all regular parametric problems look Gaussian
- Fisher information sets the fundamental efficiency limit
Exercises
Problem
Let and let . Use the delta method to find the asymptotic distribution of , the log-odds.
Problem
Explain why the asymptotic normality of MLE fails for . What is the actual rate of convergence of ?
Problem
In the LAN expansion, the experiment at local scale looks like observing . Show that the optimal estimator of in this Gaussian experiment is and that its risk equals , recovering the Cramér-Rao bound.
References
Canonical:
- van der Vaart, Asymptotic Statistics (1998), Chapters 2-7
- Lehmann & Casella, Theory of Point Estimation (1998), Chapter 6
Current:
-
Wasserman, All of Statistics (2004), Chapters 9-10
-
Keener, Theoretical Statistics (2010), Chapters 7-9
-
van der Vaart, Asymptotic Statistics (1998), Chapters 2-8
Next Topics
Natural extensions from asymptotic statistics:
- Semiparametric efficiency: efficiency theory when nuisance parameters are infinite-dimensional
- Bootstrap methods: resampling as an alternative to asymptotic approximation
- Higher-order asymptotics: Edgeworth expansions and Bartlett corrections
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Central Limit TheoremLayer 0B
- Law of Large NumbersLayer 0B
- Common Probability DistributionsLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Maximum Likelihood EstimationLayer 0B
- Differentiation in RnLayer 0A