Cramér-Rao vs. Minimax Lower Bounds. Local Unbiased vs. Global Worst-Case

What Each Measures

Both the Cramér-Rao bound and minimax lower bounds answer the question: how accurately can you estimate a parameter from data? They operate at different levels of generality.

Cramér-Rao bounds the variance of any unbiased estimator of a parameter $\theta$ at a specific parameter value. It uses the Fisher information.

Minimax lower bounds bound the worst-case risk of any estimator (biased or unbiased) over an entire parameter class. They use hypothesis testing reductions (Fano, Le Cam, or Assouad).

Side-by-Side Statement

Definition

Cramér-Rao Lower Bound

Let $X_1, \ldots, X_n \sim p_\theta$ and let $\hat{\theta}(X)$ be an unbiased estimator of $\theta \in \mathbb{R}$ . If the Fisher information $I(\theta) = \mathbb{E}[(\partial_\theta \log p_\theta(X))^2]$ exists and is positive, then:

$\text{Var}_\theta(\hat{\theta}) \geq \frac{1}{n \cdot I(\theta)}$

For a vector parameter $\theta \in \mathbb{R}^d$ , the covariance matrix satisfies $\text{Cov}(\hat{\theta}) \succeq \frac{1}{n} I(\theta)^{-1}$ in the Loewner order.

Definition

Minimax Lower Bound

Let $\Theta$ be a parameter class, $d(\cdot, \cdot)$ a loss function, and $X \sim P_\theta^n$ . A minimax lower bound states:

$\inf_{\hat{\theta}} \sup_{\theta \in \Theta} \mathbb{E}[d(\hat{\theta}, \theta)] \geq r_n$

The infimum is over all estimators (including biased ones). The supremum is over the worst case in $\Theta$ . The quantity $r_n$ is established via Fano, Le Cam, or Assouad arguments.

Where Each Is Stronger

Cramér-Rao wins on ease of computation

Computing the Cramér-Rao bound requires only the Fisher information, which is a single expectation involving the score function. For exponential family models, the Fisher information has a closed form. No hypothesis construction or packing arguments are needed.

For $n$ i.i.d. samples from $N(\theta, \sigma^2)$ : $I(\theta) = 1/\sigma^2$ , so the bound is $\text{Var}(\hat{\theta}) \geq \sigma^2/n$ . The sample mean achieves this bound exactly.

Minimax wins on generality

Minimax bounds apply to all estimators, not just unbiased ones. This matters because:

Many good estimators are biased. Ridge regression, shrinkage estimators, and Bayesian posterior means are all biased.
The Cramér-Rao bound can be beaten by biased estimators with lower MSE. The James-Stein estimator has lower MSE than the sample mean for $d \geq 3$ , even though the sample mean achieves the Cramér-Rao bound.

Minimax bounds also apply uniformly over the parameter class, giving a worst-case guarantee rather than a pointwise one.

Where Each Fails

Cramér-Rao fails for biased estimators

The standard Cramér-Rao bound applies only to unbiased estimators. In dimensions $d \geq 3$ , unbiased estimators are inadmissible for squared loss (Stein's phenomenon). The bound therefore applies to estimators that no one should use. An extended version exists for biased estimators, but it requires knowing the bias function, which depends on the unknown parameter.

Cramér-Rao fails for nonparametric problems

Fisher information is defined for parametric models. For nonparametric estimation (e.g., estimating a density over a Sobolev class), the Cramér-Rao approach does not directly apply. Minimax theory was specifically developed to handle such settings.

Minimax fails to give instance-specific bounds

Minimax bounds describe the worst case over $\Theta$ . If you know you are at a specific $\theta_0$ where estimation is easy (e.g., high Fisher information), the minimax bound may be much looser than Cramér-Rao at that point. Minimax theory tells you the hardest case, not the typical case.

Key Assumptions That Differ

	Cramér-Rao	Minimax
Estimator class	Unbiased only	All estimators
Scope	Single parameter value $\theta$	Worst case over $\Theta$
Key quantity	Fisher information $I(\theta)$	Packing numbers, KL divergences
Type of bound	Local (pointwise)	Global (uniform)
Achievability	MLE achieves it asymptotically	Minimax-optimal estimators exist for many problems
Applies to	Parametric models	Parametric and nonparametric

The Connection: Asymptotic Equivalence

Theorem

Asymptotic Local Minimax and Cramér-Rao

Statement

Under local asymptotic normality (LAN), the local minimax risk at $\theta_0$ for estimating $\theta$ with squared loss satisfies:

$\lim_{n \to \infty} n \cdot \inf_{\hat{\theta}} \sup_{\|\theta - \theta_0\| \leq \delta} \mathbb{E}_\theta[\|\hat{\theta} - \theta\|^2] = \text{tr}(I(\theta_0)^{-1})$

This matches the Cramér-Rao bound. The local minimax risk and the Cramér-Rao bound agree asymptotically for regular parametric models.

Intuition

In smooth parametric models with large samples, the distinction between local minimax and Cramér-Rao vanishes. The Fisher information captures the local difficulty of estimation, and both frameworks agree on its reciprocal as the fundamental limit. The gap between them matters primarily in nonparametric settings, finite samples, or irregular models.

report a correction →

What to Memorize

Cramér-Rao: Variance $\geq 1/(nI(\theta))$ for unbiased estimators. Local, pointwise, parametric.
Minimax: Risk $\geq r_n$ for all estimators. Global, worst-case, parametric or nonparametric.
When they agree: Asymptotically in regular parametric models under LAN.
When they disagree: Finite samples, biased estimators, nonparametric problems, irregular models.
Practical implication: Cramér-Rao tells you the cost of estimation at a point. Minimax tells you the cost of estimation over a class.

When a Researcher Would Use Each

Example

Efficiency of MLE

To show that MLE is asymptotically efficient for a parametric model, use Cramér-Rao. Compute the Fisher information, verify regularity conditions, and show that MLE variance achieves $1/(nI(\theta))$ asymptotically.

Example

Optimal rate for nonparametric regression

To prove that the minimax rate for estimating an $s$ -smooth function on $[0,1]$ is $n^{-2s/(2s+1)}$ , use minimax lower bounds via Fano's method or Assouad's lemma. Cramér-Rao does not apply because the function class is not a finite-dimensional parametric model.

Common Confusions

Watch Out

Beating the Cramér-Rao bound does not violate any theorem

The Cramér-Rao bound applies to unbiased estimators. A biased estimator can have lower MSE. The James-Stein estimator has lower risk than the sample mean for Gaussian location in $d \geq 3$ dimensions, despite the sample mean being the UMVUE. This does not contradict Cramér-Rao because James-Stein is biased.

Watch Out

Cramér-Rao is not a minimax bound

Cramér-Rao bounds variance at a single $\theta$ . It does not say anything about worst-case performance over a parameter class. A Cramér-Rao bound that is small at one $\theta$ can be large at another, and the minimax risk depends on the hardest case.

Watch Out

Fisher information is not always well-defined

The Cramér-Rao bound requires the model to satisfy regularity conditions: the support of $p_\theta$ must not depend on $\theta$ , and the score must have finite variance. For uniform distributions $U(0, \theta)$ , the Fisher information diverges and the Cramér-Rao bound is zero, which is not achievable. Minimax bounds still work in such cases.