Skip to main content

Statistics

Delta Method

Asymptotic distribution of a smooth function of an estimator. If sqrt(n)(T_n - mu) converges to N(0, sigma^2), then sqrt(n)(g(T_n) - g(mu)) converges to N(0, [g'(mu)]^2 sigma^2). The multivariate version uses the Jacobian; the second-order version handles vanishing derivatives. The page derives the result, works three canonical examples (variance of a log proportion, variance of a ratio of means, asymptotic variance of the sample correlation), and ties the construction to variance-stabilizing transformations.

ImportantCoreTier 1StableCore spine~50 min
For:MLStats

Why This Matters

Every standard error of a smooth statistic comes from the delta method. The variance of logp^\log \hat p, the variance of Xˉ/Yˉ\bar X / \bar Y, the asymptotic variance of the sample correlation, the standard error of a fitted odds ratio: each is one Taylor expansion away from a central-limit-theorem statement.

The result is short. If n(Tnμ)dN(0,σ2)\sqrt{n}(T_n - \mu) \xrightarrow{d} N(0, \sigma^2) and gg is differentiable at μ\mu with g(μ)0g'(\mu) \neq 0, then n(g(Tn)g(μ))dN(0,[g(μ)]2σ2)\sqrt{n}(g(T_n) - g(\mu)) \xrightarrow{d} N(0, [g'(\mu)]^2 \sigma^2). The proof is one Taylor expansion plus Slutsky's theorem. The applications are everywhere.

Univariate Statement

Theorem

Delta Method (univariate)

Statement

Let TnT_n be a sequence of random variables with n(Tnμ)dN(0,σ2).\sqrt{n}(T_n - \mu) \xrightarrow{d} N(0, \sigma^2). If g:RRg : \mathbb{R} \to \mathbb{R} is differentiable at μ\mu and g(μ)0g'(\mu) \neq 0, then n(g(Tn)g(μ))dN ⁣(0,[g(μ)]2σ2).\sqrt{n}(g(T_n) - g(\mu)) \xrightarrow{d} N\!\left(0, [g'(\mu)]^2 \sigma^2\right).

Intuition

Near μ\mu, the smooth function gg is approximately linear with slope g(μ)g'(\mu). A linear transform of an approximately normal random variable is approximately normal, with variance multiplied by the square of the slope.

Proof Sketch

Write g(Tn)=g(μ)+g(μ)(Tnμ)+Rng(T_n) = g(\mu) + g'(\mu)(T_n - \mu) + R_n with Rn=o(Tnμ)R_n = o(|T_n - \mu|) by differentiability. Multiply by n\sqrt{n}: n(g(Tn)g(μ))=g(μ)n(Tnμ)+nRn.\sqrt{n}(g(T_n) - g(\mu)) = g'(\mu) \cdot \sqrt{n}(T_n - \mu) + \sqrt{n} R_n. The first term converges in distribution to N(0,[g(μ)]2σ2)N(0, [g'(\mu)]^2 \sigma^2) by the continuous mapping theorem applied to multiplication by the constant g(μ)g'(\mu). For the remainder, Tnμ=Op(1/n)T_n - \mu = O_p(1/\sqrt{n}), so Rn=op(1/n)R_n = o_p(1/\sqrt{n}) and nRn=op(1)\sqrt{n} R_n = o_p(1). Slutsky's theorem absorbs the remainder.

Why It Matters

This single statement gives the standard error of any plug-in estimator that is a smooth function of a CLT-rate estimator. The pattern is: write the estimator as gg applied to a sample mean, identify μ\mu and σ2\sigma^2, compute g(μ)g'(\mu), and read off the asymptotic variance.

Failure Mode

The delta method fails or needs adjustment when g(μ)=0g'(\mu) = 0 (use the second-order version below), when gg is not differentiable at μ\mu (the limit may not be normal; e.g., g(x)=xg(x) = |x| at μ=0\mu = 0 gives a folded normal), or when TnT_n converges at a rate other than n\sqrt{n} (the same expansion holds but with that rate replacing n\sqrt{n}).

Multivariate Statement

Theorem

Delta Method (multivariate)

Statement

Let TnRkT_n \in \mathbb{R}^k satisfy n(Tnμ)dNk(0,Σ)\sqrt{n}(T_n - \mu) \xrightarrow{d} N_k(0, \Sigma) and let g:RkRmg : \mathbb{R}^k \to \mathbb{R}^m be differentiable at μ\mu with Jacobian G(μ)=gxx=μRm×k.G(\mu) = \frac{\partial g}{\partial x}\bigg|_{x = \mu} \in \mathbb{R}^{m \times k}. Then n(g(Tn)g(μ))dNm ⁣(0,  G(μ)ΣG(μ)).\sqrt{n}(g(T_n) - g(\mu)) \xrightarrow{d} N_m\!\left(0,\; G(\mu) \, \Sigma \, G(\mu)^\top\right).

Intuition

The Jacobian is the multivariate analog of g(μ)g'(\mu). The push-forward of a Gaussian through a linear map LL is again Gaussian with covariance LΣLL \Sigma L^\top. The delta method says: replace the nonlinear gg by its linearization G(μ)G(\mu), then apply the push-forward rule.

Proof Sketch

Vector Taylor expansion: g(Tn)=g(μ)+G(μ)(Tnμ)+Rng(T_n) = g(\mu) + G(\mu)(T_n - \mu) + R_n with Rn=o(Tnμ)\|R_n\| = o(\|T_n - \mu\|). Multiply by n\sqrt{n}: n(g(Tn)g(μ))=G(μ)n(Tnμ)+nRn.\sqrt{n}(g(T_n) - g(\mu)) = G(\mu) \cdot \sqrt{n}(T_n - \mu) + \sqrt{n} R_n. The first term converges to Nm(0,G(μ)ΣG(μ))N_m(0, G(\mu) \Sigma G(\mu)^\top) because G(μ)G(\mu) is a deterministic matrix. The remainder is op(1)o_p(1) by the same n\sqrt{n}-rate argument as in the univariate case. Slutsky finishes.

Why It Matters

The multivariate version is what makes the delta method useful in practice. Most interesting statistics are functions of multiple sample moments: a sample correlation is a function of three sample averages, a ratio is a function of two, a likelihood ratio is a function of many. Compute the Jacobian, sandwich the covariance, and you have the asymptotic variance.

Failure Mode

If G(μ)G(\mu) has a zero row, the corresponding component of g(Tn)g(T_n) converges at rate faster than n\sqrt{n} and its limit must be analyzed separately (second-order). If Σ\Sigma is rank-deficient, the limit normal is degenerate on a lower-dimensional subspace; this still holds, but interpret with care.

Second-Order Version

Theorem

Delta Method (second-order)

Statement

If n(Tnμ)dN(0,σ2)\sqrt{n}(T_n - \mu) \xrightarrow{d} N(0, \sigma^2), gg is twice differentiable at μ\mu, g(μ)=0g'(\mu) = 0, and g(μ)0g''(\mu) \neq 0, then n(g(Tn)g(μ))d12g(μ)σ2χ12.n \cdot (g(T_n) - g(\mu)) \xrightarrow{d} \tfrac{1}{2} g''(\mu) \, \sigma^2 \, \chi^2_1. The convergence rate is nn, not n\sqrt{n}, and the limit is a scaled chi-squared with one degree of freedom, not a normal.

Intuition

When the gradient vanishes, the linear term in the Taylor expansion is zero and the leading behavior is quadratic. Squaring a centered normal produces a χ12\chi^2_1, and the rate doubles from n\sqrt{n} to nn because the squared deviation is of order 1/n1/n rather than 1/n1/\sqrt{n}.

Proof Sketch

Taylor: g(Tn)g(μ)=12g(μ)(Tnμ)2+op((Tnμ)2)g(T_n) - g(\mu) = \tfrac{1}{2} g''(\mu) (T_n - \mu)^2 + o_p((T_n - \mu)^2) since g(μ)=0g'(\mu) = 0. Multiply by nn: n(g(Tn)g(μ))=12g(μ)n(Tnμ)2+op(1).n(g(T_n) - g(\mu)) = \tfrac{1}{2} g''(\mu) \cdot n(T_n - \mu)^2 + o_p(1). By the continuous mapping theorem applied to xx2x \mapsto x^2, n(Tnμ)2=[n(Tnμ)]2dσ2χ12n(T_n - \mu)^2 = [\sqrt{n}(T_n - \mu)]^2 \xrightarrow{d} \sigma^2 \chi^2_1. Slutsky absorbs the op(1)o_p(1).

Why It Matters

The second-order version is the right tool whenever the parameter sits at a critical point of the function being studied. The canonical example is variance estimation at the boundary: if θ^n\hat\theta_n estimates θ0\theta_0 and you study θ^n2θ02\hat\theta_n^2 - \theta_0^2 at θ0=0\theta_0 = 0, the linear term vanishes and the limit is χ12\chi^2_1.

Failure Mode

If both g(μ)g'(\mu) and g(μ)g''(\mu) vanish, the rate accelerates further and the limit involves higher derivatives. If gg is only once differentiable at μ\mu, the second-order expansion does not exist and a different argument is needed.

Worked Example 1: Variance of a Log Sample Proportion

Let p^n=1ni=1nXi\hat p_n = \frac{1}{n}\sum_{i=1}^n X_i where XiBernoulli(p)X_i \sim \text{Bernoulli}(p) independently with 0<p<10 < p < 1. The CLT gives n(p^np)dN(0,p(1p)).\sqrt{n}(\hat p_n - p) \xrightarrow{d} N(0, p(1-p)). Take g(x)=logxg(x) = \log x. Then g(p)=1/pg'(p) = 1/p. The univariate delta method gives n(logp^nlogp)dN ⁣(0,1pp).\sqrt{n}(\log \hat p_n - \log p) \xrightarrow{d} N\!\left(0, \frac{1-p}{p}\right). The asymptotic standard error of logp^n\log \hat p_n is therefore (1p)/(np)\sqrt{(1-p)/(np)}. Notice that the variance is unbounded as p0p \to 0: estimating logp\log p is unstable for rare events, which is exactly the regime where this expression is most often used.

Worked Example 2: Ratio of Two Means

Suppose (Xi,Yi)(X_i, Y_i) are iid with EX=μXE X = \mu_X, EY=μY0E Y = \mu_Y \neq 0, Var(X)=σX2\text{Var}(X) = \sigma_X^2, Var(Y)=σY2\text{Var}(Y) = \sigma_Y^2, Cov(X,Y)=σXY\text{Cov}(X, Y) = \sigma_{XY}. The bivariate CLT gives n ⁣(XˉnμXYˉnμY)dN2 ⁣(0,  Σ),Σ=(σX2σXYσXYσY2).\sqrt{n}\!\begin{pmatrix} \bar X_n - \mu_X \\ \bar Y_n - \mu_Y \end{pmatrix} \xrightarrow{d} N_2\!\left(0,\; \Sigma\right), \quad \Sigma = \begin{pmatrix} \sigma_X^2 & \sigma_{XY} \\ \sigma_{XY} & \sigma_Y^2 \end{pmatrix}. Let g(x,y)=x/yg(x, y) = x/y, so g(μX,μY)=μX/μYg(\mu_X, \mu_Y) = \mu_X / \mu_Y and the gradient is g(μX,μY)=(1μY,  μXμY2).\nabla g(\mu_X, \mu_Y) = \left( \frac{1}{\mu_Y},\; -\frac{\mu_X}{\mu_Y^2} \right). The multivariate delta method gives n ⁣(XˉnYˉnμXμY)dN(0,v),\sqrt{n}\!\left( \frac{\bar X_n}{\bar Y_n} - \frac{\mu_X}{\mu_Y} \right) \xrightarrow{d} N(0, v), where v=σX2μY22μXσXYμY3+μX2σY2μY4=μX2μY2 ⁣(σX2μX22σXYμXμY+σY2μY2).v = \frac{\sigma_X^2}{\mu_Y^2} - \frac{2 \mu_X \sigma_{XY}}{\mu_Y^3} + \frac{\mu_X^2 \sigma_Y^2}{\mu_Y^4} = \frac{\mu_X^2}{\mu_Y^2}\!\left( \frac{\sigma_X^2}{\mu_X^2} - \frac{2 \sigma_{XY}}{\mu_X \mu_Y} + \frac{\sigma_Y^2}{\mu_Y^2} \right). The second factor is the squared coefficient of variation of the ratio. This is the standard ratio-estimator variance used in survey sampling.

Worked Example 3: Asymptotic Variance of the Sample Correlation

Let (Xi,Yi)(X_i, Y_i) be iid bivariate with finite fourth moments, EX=EY=0E X = E Y = 0 (without loss of generality after centering), variances σX2,σY2\sigma_X^2, \sigma_Y^2, and correlation ρ\rho. The sample correlation is ρ^n=1niXiYi1niXi21niYi2.\hat\rho_n = \frac{\frac{1}{n}\sum_i X_i Y_i}{\sqrt{\frac{1}{n}\sum_i X_i^2 \cdot \frac{1}{n}\sum_i Y_i^2}}. Define the vector of sample moments Tn=(XYn,X2n,Y2n)T_n = (\overline{XY}_n, \overline{X^2}_n, \overline{Y^2}_n) with mean μ=(σXY,σX2,σY2)\mu = (\sigma_{XY}, \sigma_X^2, \sigma_Y^2) where σXY=ρσXσY\sigma_{XY} = \rho \sigma_X \sigma_Y. Write g(u,v,w)=uvw,g(u, v, w) = \frac{u}{\sqrt{v w}}, so ρ^n=g(Tn)\hat\rho_n = g(T_n) and g(μ)=ρg(\mu) = \rho.

Computing the partial derivatives at μ\mu and applying the multivariate delta method to the joint CLT for TnT_n, under the standard assumption that XX and YY are jointly normal, one finds n(ρ^nρ)dN(0,(1ρ2)2).\sqrt{n}(\hat\rho_n - \rho) \xrightarrow{d} N(0, (1 - \rho^2)^2). This explains Fisher's zz-transformation: z(ρ^n)=12log ⁣(1+ρ^n1ρ^n)z(\hat\rho_n) = \tfrac{1}{2} \log\!\left(\frac{1+\hat\rho_n}{1-\hat\rho_n}\right) has g(ρ)=1/(1ρ2)g'(\rho) = 1/(1-\rho^2), which cancels the (1ρ2)(1-\rho^2) factor in the asymptotic standard error and yields a limit variance of 11. The Fisher transformation is the variance-stabilizing transformation for the sample correlation under bivariate normality. See variance-stabilizing transformations for the construction.

Tie to Variance-Stabilizing Transformations

The delta method gives Var(g(Tn))[g(μ)]2σ2(μ)/n\text{Var}(g(T_n)) \approx [g'(\mu)]^2 \sigma^2(\mu) / n. If σ2\sigma^2 depends on μ\mu (as it does for Poisson, Binomial proportions, and many other one-parameter families), the variance of the raw statistic varies with μ\mu. Picking gg so that [g(μ)]2σ2(μ)[g'(\mu)]^2 \sigma^2(\mu) is constant in μ\mu removes the dependence. Solving the ODE g(μ)=cσ(μ)g'(\mu) = \frac{c}{\sigma(\mu)} gives g(μ)=cdμ/σ(μ)g(\mu) = c \int d\mu / \sigma(\mu). The Poisson square-root transform, the binomial arcsin-square-root transform, and the Fisher correlation transform all come from this construction.

Common Confusions

Watch Out

The delta method is about variance, not bias

The first-order delta method gives the asymptotic distribution of g(Tn)g(T_n) around g(μ)g(\mu), not the exact mean of g(Tn)g(T_n). By Jensen's inequality, E[g(Tn)]g(ETn)E[g(T_n)] \neq g(E T_n) in general; the gap is of order 1/n1/n and shows up in second-order bias corrections (Edgeworth expansion territory), not in the leading-order normality statement.

Watch Out

Vanishing derivative changes the rate, not just the variance

If g(μ)=0g'(\mu) = 0, do not just set the asymptotic variance to zero and call it done. The leading behavior becomes quadratic, the rate is nn instead of n\sqrt{n}, and the limit is chi-squared, not normal. Apply the second-order version.

Watch Out

The CLT rate is not always sqrt n

Some estimators converge faster (e.g., MLE for the boundary of a uniform converges at rate nn) or slower (e.g., nonparametric density estimation at rate n2/5n^{2/5}). Use whatever rate the underlying CLT gives, not a reflex n\sqrt{n}.

Watch Out

Plug-in standard errors estimate g prime of mu hat, not g prime of mu

In practice μ\mu is unknown and the asymptotic variance [g(μ)]2σ2[g'(\mu)]^2 \sigma^2 is estimated by [g(Tn)]2σ^n2[g'(T_n)]^2 \hat\sigma_n^2. This is consistent under continuity of gg' at μ\mu and convergence σ^n2σ2\hat\sigma_n^2 \to \sigma^2, both standard. The substitution is valid because Slutsky lets you replace consistent estimators inside convergence-in-distribution statements.

Exercises

ExerciseCore

Problem

Let X1,,XnExponential(λ)X_1, \ldots, X_n \sim \text{Exponential}(\lambda) independently. The MLE of λ\lambda is λ^n=1/Xˉn\hat\lambda_n = 1/\bar X_n. Compute the asymptotic distribution of n(λ^nλ)\sqrt{n}(\hat\lambda_n - \lambda) using the delta method.

ExerciseCore

Problem

Let p^n\hat p_n be the sample proportion in nn independent Bernoulli(pp) trials. Find the asymptotic distribution of p^n(1p^n)\hat p_n (1 - \hat p_n), the plug-in estimator of the Bernoulli variance.

ExerciseAdvanced

Problem

Derive the asymptotic distribution of n(log(Xˉn/Yˉn)log(μX/μY))\sqrt{n}(\log(\bar X_n / \bar Y_n) - \log(\mu_X / \mu_Y)) for iid pairs (Xi,Yi)(X_i, Y_i) with mean (μX,μY)(\mu_X, \mu_Y), μY>0\mu_Y > 0, and joint covariance matrix Σ\Sigma.

References

Canonical:

  • Casella and Berger, Statistical Inference (2002), 2nd edition, Sections 5.5.4 and 10.1.6
  • van der Vaart, Asymptotic Statistics (1998), Chapter 3
  • Lehmann and Romano, Testing Statistical Hypotheses (2005), 3rd edition, Sections 11.2 and 14.1

Applications and variance-stabilizing transformations:

  • Bickel and Doksum, Mathematical Statistics: Basic Ideas and Selected Topics, Volume I (2015), 2nd edition, Section 5.3
  • Cox and Hinkley, Theoretical Statistics (1974), Section 9.2
  • Efron and Tibshirani, An Introduction to the Bootstrap (1993), Chapter 4 (parametric standard errors and the bootstrap alternative)

Next Topics

Last reviewed: May 12, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Graph-backed continuations