Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Comparison

Cramér-Rao Bound vs. Minimax Lower Bounds

Two frameworks for bounding estimation difficulty: Cramér-Rao gives a local lower bound for unbiased estimators at a single parameter value, while minimax lower bounds apply to all estimators over an entire parameter class.

What Each Measures

Both the Cramér-Rao bound and minimax lower bounds answer the question: how accurately can you estimate a parameter from data? They operate at different levels of generality.

Cramér-Rao bounds the variance of any unbiased estimator of a parameter θ\theta at a specific parameter value. It uses the Fisher information.

Minimax lower bounds bound the worst-case risk of any estimator (biased or unbiased) over an entire parameter class. They use hypothesis testing reductions (Fano, Le Cam, or Assouad).

Side-by-Side Statement

Definition

Cramér-Rao Lower Bound

Let X1,,XnpθX_1, \ldots, X_n \sim p_\theta and let θ^(X)\hat{\theta}(X) be an unbiased estimator of θR\theta \in \mathbb{R}. If the Fisher information I(θ)=E[(θlogpθ(X))2]I(\theta) = \mathbb{E}[(\partial_\theta \log p_\theta(X))^2] exists and is positive, then:

Varθ(θ^)1nI(θ)\text{Var}_\theta(\hat{\theta}) \geq \frac{1}{n \cdot I(\theta)}

For a vector parameter θRd\theta \in \mathbb{R}^d, the covariance matrix satisfies Cov(θ^)1nI(θ)1\text{Cov}(\hat{\theta}) \succeq \frac{1}{n} I(\theta)^{-1} in the Loewner order.

Definition

Minimax Lower Bound

Let Θ\Theta be a parameter class, d(,)d(\cdot, \cdot) a loss function, and XPθnX \sim P_\theta^n. A minimax lower bound states:

infθ^supθΘE[d(θ^,θ)]rn\inf_{\hat{\theta}} \sup_{\theta \in \Theta} \mathbb{E}[d(\hat{\theta}, \theta)] \geq r_n

The infimum is over all estimators (including biased ones). The supremum is over the worst case in Θ\Theta. The quantity rnr_n is established via Fano, Le Cam, or Assouad arguments.

Where Each Is Stronger

Cramér-Rao wins on ease of computation

Computing the Cramér-Rao bound requires only the Fisher information, which is a single expectation involving the score function. For exponential family models, the Fisher information has a closed form. No hypothesis construction or packing arguments are needed.

For nn i.i.d. samples from N(θ,σ2)N(\theta, \sigma^2): I(θ)=1/σ2I(\theta) = 1/\sigma^2, so the bound is Var(θ^)σ2/n\text{Var}(\hat{\theta}) \geq \sigma^2/n. The sample mean achieves this bound exactly.

Minimax wins on generality

Minimax bounds apply to all estimators, not just unbiased ones. This matters because:

  1. Many good estimators are biased. Ridge regression, shrinkage estimators, and Bayesian posterior means are all biased.
  2. The Cramér-Rao bound can be beaten by biased estimators with lower MSE. The James-Stein estimator has lower MSE than the sample mean for d3d \geq 3, even though the sample mean achieves the Cramér-Rao bound.

Minimax bounds also apply uniformly over the parameter class, giving a worst-case guarantee rather than a pointwise one.

Where Each Fails

Cramér-Rao fails for biased estimators

The standard Cramér-Rao bound applies only to unbiased estimators. In dimensions d3d \geq 3, unbiased estimators are inadmissible for squared loss (Stein's phenomenon). The bound therefore applies to estimators that no one should use. An extended version exists for biased estimators, but it requires knowing the bias function, which depends on the unknown parameter.

Cramér-Rao fails for nonparametric problems

Fisher information is defined for parametric models. For nonparametric estimation (e.g., estimating a density over a Sobolev class), the Cramér-Rao approach does not directly apply. Minimax theory was specifically developed to handle such settings.

Minimax fails to give instance-specific bounds

Minimax bounds describe the worst case over Θ\Theta. If you know you are at a specific θ0\theta_0 where estimation is easy (e.g., high Fisher information), the minimax bound may be much looser than Cramér-Rao at that point. Minimax theory tells you the hardest case, not the typical case.

Key Assumptions That Differ

Cramér-RaoMinimax
Estimator classUnbiased onlyAll estimators
ScopeSingle parameter value θ\thetaWorst case over Θ\Theta
Key quantityFisher information I(θ)I(\theta)Packing numbers, KL divergences
Type of boundLocal (pointwise)Global (uniform)
AchievabilityMLE achieves it asymptoticallyMinimax-optimal estimators exist for many problems
Applies toParametric modelsParametric and nonparametric

The Connection: Asymptotic Equivalence

Theorem

Asymptotic Local Minimax and Cramér-Rao

Statement

Under local asymptotic normality (LAN), the local minimax risk at θ0\theta_0 for estimating θ\theta with squared loss satisfies:

limnninfθ^supθθ0δEθ[θ^θ2]=tr(I(θ0)1)\lim_{n \to \infty} n \cdot \inf_{\hat{\theta}} \sup_{\|\theta - \theta_0\| \leq \delta} \mathbb{E}_\theta[\|\hat{\theta} - \theta\|^2] = \text{tr}(I(\theta_0)^{-1})

This matches the Cramér-Rao bound. The local minimax risk and the Cramér-Rao bound agree asymptotically for regular parametric models.

Intuition

In smooth parametric models with large samples, the distinction between local minimax and Cramér-Rao vanishes. The Fisher information captures the local difficulty of estimation, and both frameworks agree on its reciprocal as the fundamental limit. The gap between them matters primarily in nonparametric settings, finite samples, or irregular models.

What to Memorize

  1. Cramér-Rao: Variance 1/(nI(θ))\geq 1/(nI(\theta)) for unbiased estimators. Local, pointwise, parametric.

  2. Minimax: Risk rn\geq r_n for all estimators. Global, worst-case, parametric or nonparametric.

  3. When they agree: Asymptotically in regular parametric models under LAN.

  4. When they disagree: Finite samples, biased estimators, nonparametric problems, irregular models.

  5. Practical implication: Cramér-Rao tells you the cost of estimation at a point. Minimax tells you the cost of estimation over a class.

When a Researcher Would Use Each

Example

Efficiency of MLE

To show that MLE is asymptotically efficient for a parametric model, use Cramér-Rao. Compute the Fisher information, verify regularity conditions, and show that MLE variance achieves 1/(nI(θ))1/(nI(\theta)) asymptotically.

Example

Optimal rate for nonparametric regression

To prove that the minimax rate for estimating an ss-smooth function on [0,1][0,1] is n2s/(2s+1)n^{-2s/(2s+1)}, use minimax lower bounds via Fano's method or Assouad's lemma. Cramér-Rao does not apply because the function class is not a finite-dimensional parametric model.

Common Confusions

Watch Out

Beating the Cramér-Rao bound does not violate any theorem

The Cramér-Rao bound applies to unbiased estimators. A biased estimator can have lower MSE. The James-Stein estimator has lower risk than the sample mean for Gaussian location in d3d \geq 3 dimensions, despite the sample mean being the UMVUE. This does not contradict Cramér-Rao because James-Stein is biased.

Watch Out

Cramér-Rao is not a minimax bound

Cramér-Rao bounds variance at a single θ\theta. It does not say anything about worst-case performance over a parameter class. A Cramér-Rao bound that is small at one θ\theta can be large at another, and the minimax risk depends on the hardest case.

Watch Out

Fisher information is not always well-defined

The Cramér-Rao bound requires the model to satisfy regularity conditions: the support of pθp_\theta must not depend on θ\theta, and the score must have finite variance. For uniform distributions U(0,θ)U(0, \theta), the Fisher information diverges and the Cramér-Rao bound is zero, which is not achievable. Minimax bounds still work in such cases.