Delta Method

Sneiderman, Robby

Statistics

Delta Method

Asymptotic distribution of a smooth function of an estimator. If sqrt(n)(T_n - mu) converges to N(0, sigma^2), then sqrt(n)(g(T_n) - g(mu)) converges to N(0, [g'(mu)]^2 sigma^2). The multivariate version uses the Jacobian; the second-order version handles vanishing derivatives. The page derives the result, works three canonical examples (variance of a log proportion, variance of a ratio of means, asymptotic variance of the sample correlation), and ties the construction to variance-stabilizing transformations.

ImportantCoreTier 1StableCore spine~50 min

For:MLStats

Prerequisites

Central Limit Theorem Modes of Convergence Random Variables Expectation Variance Covariance Moments Asymptotic Statistics

Prereq Map

Why This Matters

Every standard error of a smooth statistic comes from the delta method. The variance of $\log \hat p$ , the variance of $\bar X / \bar Y$ , the asymptotic variance of the sample correlation, the standard error of a fitted odds ratio: each is one Taylor expansion away from a central-limit-theorem statement.

The result is short. If $\sqrt{n}(T_n - \mu) \xrightarrow{d} N(0, \sigma^2)$ and $g$ is differentiable at $\mu$ with $g'(\mu) \neq 0$ , then $\sqrt{n}(g(T_n) - g(\mu)) \xrightarrow{d} N(0, [g'(\mu)]^2 \sigma^2)$ . The proof is one Taylor expansion plus Slutsky's theorem. The applications are everywhere.

Univariate Statement

Theorem

Delta Method (univariate)

Statement

Let $T_n$ be a sequence of random variables with $\sqrt{n}(T_n - \mu) \xrightarrow{d} N(0, \sigma^2).$ If $g : \mathbb{R} \to \mathbb{R}$ is differentiable at $\mu$ and $g'(\mu) \neq 0$ , then $\sqrt{n}(g(T_n) - g(\mu)) \xrightarrow{d} N\!\left(0, [g'(\mu)]^2 \sigma^2\right).$

Intuition

Near $\mu$ , the smooth function $g$ is approximately linear with slope $g'(\mu)$ . A linear transform of an approximately normal random variable is approximately normal, with variance multiplied by the square of the slope.

Proof Sketch

Write $g(T_n) = g(\mu) + g'(\mu)(T_n - \mu) + R_n$ with $R_n = o(|T_n - \mu|)$ by differentiability. Multiply by $\sqrt{n}$ : $\sqrt{n}(g(T_n) - g(\mu)) = g'(\mu) \cdot \sqrt{n}(T_n - \mu) + \sqrt{n} R_n.$ The first term converges in distribution to $N(0, [g'(\mu)]^2 \sigma^2)$ by the continuous mapping theorem applied to multiplication by the constant $g'(\mu)$ . For the remainder, $T_n - \mu = O_p(1/\sqrt{n})$ , so $R_n = o_p(1/\sqrt{n})$ and $\sqrt{n} R_n = o_p(1)$ . Slutsky's theorem absorbs the remainder.

Why It Matters

This single statement gives the standard error of any plug-in estimator that is a smooth function of a CLT-rate estimator. The pattern is: write the estimator as $g$ applied to a sample mean, identify $\mu$ and $\sigma^2$ , compute $g'(\mu)$ , and read off the asymptotic variance.

Failure Mode

The delta method fails or needs adjustment when $g'(\mu) = 0$ (use the second-order version below), when $g$ is not differentiable at $\mu$ (the limit may not be normal; e.g., $g(x) = |x|$ at $\mu = 0$ gives a folded normal), or when $T_n$ converges at a rate other than $\sqrt{n}$ (the same expansion holds but with that rate replacing $\sqrt{n}$ ).

report a correction →

Multivariate Statement

Theorem

Delta Method (multivariate)

Statement

Let $T_n \in \mathbb{R}^k$ satisfy $\sqrt{n}(T_n - \mu) \xrightarrow{d} N_k(0, \Sigma)$ and let $g : \mathbb{R}^k \to \mathbb{R}^m$ be differentiable at $\mu$ with Jacobian $G(\mu) = \frac{\partial g}{\partial x}\bigg|_{x = \mu} \in \mathbb{R}^{m \times k}.$ Then $\sqrt{n}(g(T_n) - g(\mu)) \xrightarrow{d} N_m\!\left(0,\; G(\mu) \, \Sigma \, G(\mu)^\top\right).$

Intuition

The Jacobian is the multivariate analog of $g'(\mu)$ . The push-forward of a Gaussian through a linear map $L$ is again Gaussian with covariance $L \Sigma L^\top$ . The delta method says: replace the nonlinear $g$ by its linearization $G(\mu)$ , then apply the push-forward rule.

Proof Sketch

Vector Taylor expansion: $g(T_n) = g(\mu) + G(\mu)(T_n - \mu) + R_n$ with $\|R_n\| = o(\|T_n - \mu\|)$ . Multiply by $\sqrt{n}$ : $\sqrt{n}(g(T_n) - g(\mu)) = G(\mu) \cdot \sqrt{n}(T_n - \mu) + \sqrt{n} R_n.$ The first term converges to $N_m(0, G(\mu) \Sigma G(\mu)^\top)$ because $G(\mu)$ is a deterministic matrix. The remainder is $o_p(1)$ by the same $\sqrt{n}$ -rate argument as in the univariate case. Slutsky finishes.

Why It Matters

The multivariate version is what makes the delta method useful in practice. Most interesting statistics are functions of multiple sample moments: a sample correlation is a function of three sample averages, a ratio is a function of two, a likelihood ratio is a function of many. Compute the Jacobian, sandwich the covariance, and you have the asymptotic variance.

Failure Mode

If $G(\mu)$ has a zero row, the corresponding component of $g(T_n)$ converges at rate faster than $\sqrt{n}$ and its limit must be analyzed separately (second-order). If $\Sigma$ is rank-deficient, the limit normal is degenerate on a lower-dimensional subspace; this still holds, but interpret with care.

report a correction →

Second-Order Version

Theorem

Delta Method (second-order)

Statement

If $\sqrt{n}(T_n - \mu) \xrightarrow{d} N(0, \sigma^2)$ , $g$ is twice differentiable at $\mu$ , $g'(\mu) = 0$ , and $g''(\mu) \neq 0$ , then $n \cdot (g(T_n) - g(\mu)) \xrightarrow{d} \tfrac{1}{2} g''(\mu) \, \sigma^2 \, \chi^2_1.$ The convergence rate is $n$ , not $\sqrt{n}$ , and the limit is a scaled chi-squared with one degree of freedom, not a normal.

Intuition

When the gradient vanishes, the linear term in the Taylor expansion is zero and the leading behavior is quadratic. Squaring a centered normal produces a $\chi^2_1$ , and the rate doubles from $\sqrt{n}$ to $n$ because the squared deviation is of order $1/n$ rather than $1/\sqrt{n}$ .

Proof Sketch

Taylor: $g(T_n) - g(\mu) = \tfrac{1}{2} g''(\mu) (T_n - \mu)^2 + o_p((T_n - \mu)^2)$ since $g'(\mu) = 0$ . Multiply by $n$ : $n(g(T_n) - g(\mu)) = \tfrac{1}{2} g''(\mu) \cdot n(T_n - \mu)^2 + o_p(1).$ By the continuous mapping theorem applied to $x \mapsto x^2$ , $n(T_n - \mu)^2 = [\sqrt{n}(T_n - \mu)]^2 \xrightarrow{d} \sigma^2 \chi^2_1$ . Slutsky absorbs the $o_p(1)$ .

Why It Matters

The second-order version is the right tool whenever the parameter sits at a critical point of the function being studied. The canonical example is variance estimation at the boundary: if $\hat\theta_n$ estimates $\theta_0$ and you study $\hat\theta_n^2 - \theta_0^2$ at $\theta_0 = 0$ , the linear term vanishes and the limit is $\chi^2_1$ .

Failure Mode

If both $g'(\mu)$ and $g''(\mu)$ vanish, the rate accelerates further and the limit involves higher derivatives. If $g$ is only once differentiable at $\mu$ , the second-order expansion does not exist and a different argument is needed.

report a correction →

Worked Example 1: Variance of a Log Sample Proportion

Let $\hat p_n = \frac{1}{n}\sum_{i=1}^n X_i$ where $X_i \sim \text{Bernoulli}(p)$ independently with $0 < p < 1$ . The CLT gives $\sqrt{n}(\hat p_n - p) \xrightarrow{d} N(0, p(1-p)).$ Take $g(x) = \log x$ . Then $g'(p) = 1/p$ . The univariate delta method gives $\sqrt{n}(\log \hat p_n - \log p) \xrightarrow{d} N\!\left(0, \frac{1-p}{p}\right).$ The asymptotic standard error of $\log \hat p_n$ is therefore $\sqrt{(1-p)/(np)}$ . Notice that the variance is unbounded as $p \to 0$ : estimating $\log p$ is unstable for rare events, which is exactly the regime where this expression is most often used.

Worked Example 2: Ratio of Two Means

Suppose $(X_i, Y_i)$ are iid with $E X = \mu_X$ , $E Y = \mu_Y \neq 0$ , $\text{Var}(X) = \sigma_X^2$ , $\text{Var}(Y) = \sigma_Y^2$ , $\text{Cov}(X, Y) = \sigma_{XY}$ . The bivariate CLT gives $\sqrt{n}\!\begin{pmatrix} \bar X_n - \mu_X \\ \bar Y_n - \mu_Y \end{pmatrix} \xrightarrow{d} N_2\!\left(0,\; \Sigma\right), \quad \Sigma = \begin{pmatrix} \sigma_X^2 & \sigma_{XY} \\ \sigma_{XY} & \sigma_Y^2 \end{pmatrix}.$ Let $g(x, y) = x/y$ , so $g(\mu_X, \mu_Y) = \mu_X / \mu_Y$ and the gradient is $\nabla g(\mu_X, \mu_Y) = \left( \frac{1}{\mu_Y},\; -\frac{\mu_X}{\mu_Y^2} \right).$ The multivariate delta method gives $\sqrt{n}\!\left( \frac{\bar X_n}{\bar Y_n} - \frac{\mu_X}{\mu_Y} \right) \xrightarrow{d} N(0, v),$ where $v = \frac{\sigma_X^2}{\mu_Y^2} - \frac{2 \mu_X \sigma_{XY}}{\mu_Y^3} + \frac{\mu_X^2 \sigma_Y^2}{\mu_Y^4} = \frac{\mu_X^2}{\mu_Y^2}\!\left( \frac{\sigma_X^2}{\mu_X^2} - \frac{2 \sigma_{XY}}{\mu_X \mu_Y} + \frac{\sigma_Y^2}{\mu_Y^2} \right).$ The second factor is the squared coefficient of variation of the ratio. This is the standard ratio-estimator variance used in survey sampling.

Worked Example 3: Asymptotic Variance of the Sample Correlation

Let $(X_i, Y_i)$ be iid bivariate with finite fourth moments, $E X = E Y = 0$ (without loss of generality after centering), variances $\sigma_X^2, \sigma_Y^2$ , and correlation $\rho$ . The sample correlation is $\hat\rho_n = \frac{\frac{1}{n}\sum_i X_i Y_i}{\sqrt{\frac{1}{n}\sum_i X_i^2 \cdot \frac{1}{n}\sum_i Y_i^2}}.$ Define the vector of sample moments $T_n = (\overline{XY}_n, \overline{X^2}_n, \overline{Y^2}_n)$ with mean $\mu = (\sigma_{XY}, \sigma_X^2, \sigma_Y^2)$ where $\sigma_{XY} = \rho \sigma_X \sigma_Y$ . Write $g(u, v, w) = \frac{u}{\sqrt{v w}},$ so $\hat\rho_n = g(T_n)$ and $g(\mu) = \rho$ .

Computing the partial derivatives at $\mu$ and applying the multivariate delta method to the joint CLT for $T_n$ , under the standard assumption that $X$ and $Y$ are jointly normal, one finds $\sqrt{n}(\hat\rho_n - \rho) \xrightarrow{d} N(0, (1 - \rho^2)^2).$ This explains Fisher's $z$ -transformation: $z(\hat\rho_n) = \tfrac{1}{2} \log\!\left(\frac{1+\hat\rho_n}{1-\hat\rho_n}\right)$ has $g'(\rho) = 1/(1-\rho^2)$ , which cancels the $(1-\rho^2)$ factor in the asymptotic standard error and yields a limit variance of $1$ . The Fisher transformation is the variance-stabilizing transformation for the sample correlation under bivariate normality. See variance-stabilizing transformations for the construction.

Tie to Variance-Stabilizing Transformations

The delta method gives $\text{Var}(g(T_n)) \approx [g'(\mu)]^2 \sigma^2(\mu) / n$ . If $\sigma^2$ depends on $\mu$ (as it does for Poisson, Binomial proportions, and many other one-parameter families), the variance of the raw statistic varies with $\mu$ . Picking $g$ so that $[g'(\mu)]^2 \sigma^2(\mu)$ is constant in $\mu$ removes the dependence. Solving the ODE $g'(\mu) = \frac{c}{\sigma(\mu)}$ gives $g(\mu) = c \int d\mu / \sigma(\mu)$ . The Poisson square-root transform, the binomial arcsin-square-root transform, and the Fisher correlation transform all come from this construction.

Common Confusions

Watch Out

The delta method is about variance, not bias

The first-order delta method gives the asymptotic distribution of $g(T_n)$ around $g(\mu)$ , not the exact mean of $g(T_n)$ . By Jensen's inequality, $E[g(T_n)] \neq g(E T_n)$ in general; the gap is of order $1/n$ and shows up in second-order bias corrections (Edgeworth expansion territory), not in the leading-order normality statement.

Watch Out

Vanishing derivative changes the rate, not just the variance

If $g'(\mu) = 0$ , do not just set the asymptotic variance to zero and call it done. The leading behavior becomes quadratic, the rate is $n$ instead of $\sqrt{n}$ , and the limit is chi-squared, not normal. Apply the second-order version.

Watch Out

The CLT rate is not always sqrt n

Some estimators converge faster (e.g., MLE for the boundary of a uniform converges at rate $n$ ) or slower (e.g., nonparametric density estimation at rate $n^{2/5}$ ). Use whatever rate the underlying CLT gives, not a reflex $\sqrt{n}$ .

Watch Out

Plug-in standard errors estimate g prime of mu hat, not g prime of mu

In practice $\mu$ is unknown and the asymptotic variance $[g'(\mu)]^2 \sigma^2$ is estimated by $[g'(T_n)]^2 \hat\sigma_n^2$ . This is consistent under continuity of $g'$ at $\mu$ and convergence $\hat\sigma_n^2 \to \sigma^2$ , both standard. The substitution is valid because Slutsky lets you replace consistent estimators inside convergence-in-distribution statements.

Exercises

ExerciseCore

Problem

Let $X_1, \ldots, X_n \sim \text{Exponential}(\lambda)$ independently. The MLE of $\lambda$ is $\hat\lambda_n = 1/\bar X_n$ . Compute the asymptotic distribution of $\sqrt{n}(\hat\lambda_n - \lambda)$ using the delta method.

ExerciseCore

Problem

Let $\hat p_n$ be the sample proportion in $n$ independent Bernoulli( $p$ ) trials. Find the asymptotic distribution of $\hat p_n (1 - \hat p_n)$ , the plug-in estimator of the Bernoulli variance.

ExerciseAdvanced

Problem

Derive the asymptotic distribution of $\sqrt{n}(\log(\bar X_n / \bar Y_n) - \log(\mu_X / \mu_Y))$ for iid pairs $(X_i, Y_i)$ with mean $(\mu_X, \mu_Y)$ , $\mu_Y > 0$ , and joint covariance matrix $\Sigma$ .

References

Canonical:

Casella and Berger, Statistical Inference (2002), 2nd edition, Sections 5.5.4 and 10.1.6
van der Vaart, Asymptotic Statistics (1998), Chapter 3
Lehmann and Romano, Testing Statistical Hypotheses (2005), 3rd edition, Sections 11.2 and 14.1

Applications and variance-stabilizing transformations:

Bickel and Doksum, Mathematical Statistics: Basic Ideas and Selected Topics, Volume I (2015), 2nd edition, Section 5.3
Cox and Hinkley, Theoretical Statistics (1974), Section 9.2
Efron and Tibshirani, An Introduction to the Bootstrap (1993), Chapter 4 (parametric standard errors and the bootstrap alternative)

Next Topics

Variance-stabilizing transformations: how the delta-method ODE $g'(\mu) = c / \sigma(\mu)$ gives the Poisson square-root, binomial arcsin, and Fisher correlation transforms.
Maximum likelihood estimation: standard errors of MLEs are delta-method consequences of the score-function CLT.
Bootstrap methods: an alternative when analytic delta-method variances are intractable.

Last reviewed: May 12, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

4

Expectation, Variance, Covariance, and Momentslayer 0A · tier 1
Asymptotic Statistics: M-Estimators, Delta Method, LANlayer 0B · tier 1
Central Limit Theoremlayer 0B · tier 1
Modes of Convergence of Random Variableslayer 0B · tier 1

Derived topics

1

Variance-Stabilizing Transformationslayer 1 · tier 1

Graph-backed continuations

Variance-Stabilizing Transformations