Statistical Estimation
Cramér-Rao Bound
The fundamental lower bound on the variance of any unbiased estimator: no unbiased estimator can have variance smaller than the reciprocal of the Fisher information.
Prerequisites
Why This Matters
When you build an estimator, you want to know: how good can it possibly be? The Cramér-Rao bound answers this for unbiased estimators. It says the variance of any unbiased estimator of is at least , where is the Fisher information per observation. No cleverness in estimator design can beat this limit.
This bound tells you whether your estimator is efficient (achieves the bound) or whether there is room for improvement. It also connects estimation theory to information geometry: the Fisher information is the Riemannian metric on the statistical manifold.
Formal Setup
Let be i.i.d. from , where .
Score Function
The score function is the derivative of the log-likelihood:
Key property: for all .
Fisher Information
The Fisher information is the variance of the score:
Under regularity conditions, equivalently .
Efficient Estimator
An unbiased estimator is efficient if it achieves the Cramér-Rao bound with equality: for all .
Main Theorems
Cramér-Rao Lower Bound
Statement
For any unbiased estimator of :
Equality holds if and only if the score function is a linear function of : there exists such that a.s.
Intuition
The score function measures sensitivity of the likelihood to . High Fisher information means the data is very informative about , so estimators can be more precise. The bound quantifies this: more information means lower achievable variance.
Proof Sketch
Apply the Cauchy-Schwarz inequality to the covariance of and the total score :
Since is unbiased, differentiating under the integral gives . Since is a sum of i.i.d. terms, . Substituting: .
Why It Matters
This is the central result of classical estimation theory. It provides a universal benchmark: if your estimator's variance equals , you know no unbiased estimator can do better. The MLE achieves this bound asymptotically, which is one reason MLE is the default estimation method.
Failure Mode
The bound only applies to unbiased estimators. Biased estimators can have lower MSE than the Cramér-Rao bound (this is the James-Stein phenomenon). The regularity conditions matter: for uniform distributions , the support depends on , the bound does not apply, and the MLE converges at rate instead of .
Multivariate Cramér-Rao Bound
Statement
For any unbiased estimator of :
where denotes the Loewner (positive semidefinite) ordering and is the Fisher information matrix with entries .
Intuition
The matrix inequality says that for any direction , the variance of is at least . Parameters that are poorly identified (low Fisher information in their direction) have high minimum variance.
Proof Sketch
Same Cauchy-Schwarz argument applied to each component, or equivalently, consider for arbitrary direction and apply the scalar Cramér-Rao bound to this one-dimensional projection.
Why It Matters
In practice, models have multiple parameters. The matrix version tells you which parameters are easy to estimate (high Fisher information) and which are hard. It also connects to the natural gradient: gradient descent in Fisher information geometry converges in fewer steps because it accounts for parameter curvature.
Failure Mode
Same as the scalar case: applies only to unbiased estimators, requires regularity conditions, and biased estimators can achieve lower MSE.
Canonical Examples
Estimating the mean of a normal distribution
Let with known. The Fisher information per observation is . The Cramér-Rao bound gives . The sample mean has variance exactly , so it is efficient.
Estimating the rate of a Poisson distribution
Let . The Fisher information is . The Cramér-Rao bound gives . The sample mean has variance , so it is efficient.
Common Confusions
The Cramér-Rao bound does not apply to biased estimators
The bound is only for unbiased estimators. A biased estimator can have lower variance. The bias-variance tradeoff means that the optimal MSE estimator is often biased. The James-Stein estimator beats in MSE when the dimension , despite being efficient in the Cramér-Rao sense.
Achieving the bound does not mean achieving the best MSE
Efficiency (in the Cramér-Rao sense) means minimum variance among unbiased estimators. It does not mean minimum MSE among all estimators. For small samples, regularized or shrinkage estimators with some bias often have lower MSE than the efficient unbiased estimator.
Regularity conditions are not just technicalities
For , the support depends on . The Cramér-Rao bound does not apply. The MLE has variance of order , which is much smaller than would suggest. When regularity fails, the parametric rate can be faster than .
Summary
- For unbiased estimators:
- The proof is a single application of Cauchy-Schwarz
- MLE is asymptotically efficient: it achieves the bound as
- Biased estimators can beat the bound in MSE
- Regularity conditions matter: the bound fails for non-regular families
Exercises
Problem
Compute the Fisher information for and state the Cramér-Rao bound for estimating from i.i.d. observations.
Problem
Show that for the exponential family , an efficient estimator exists if and only if is a linear function of the natural parameter .
Related Comparisons
References
Canonical:
- Casella & Berger, Statistical Inference (2002), Chapter 7.3
- Lehmann & Casella, Theory of Point Estimation (1998), Chapter 2
- Schervish, Theory of Statistics (1995), Section 2.3 (information inequalities)
Current:
- van der Vaart, Asymptotic Statistics (1998), Chapter 8
- Cover & Thomas, Elements of Information Theory (2006), Chapter 11.10 (Fisher information and the Cramér-Rao bound)
- Keener, Theoretical Statistics (2010), Chapter 3 (unbiased estimation and efficiency)
Next Topics
- Asymptotic statistics: MLE achieves the Cramér-Rao bound asymptotically
- Minimax lower bounds: going beyond unbiased estimators to minimax optimality
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Fisher InformationLayer 0B
- Maximum Likelihood EstimationLayer 0B
- Common Probability DistributionsLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Differentiation in RnLayer 0A