What Each Method Does
Both MLE and Method of Moments (MoM) estimate unknown parameters of a statistical model from observed data . They differ in what equation they solve to find .
MLE finds the parameter value that makes the observed data most probable under the model.
MoM finds the parameter value that makes the population moments equal to the sample moments.
Side-by-Side Formulation
Maximum Likelihood Estimator
Given a parametric model , the MLE solves:
Equivalently, set the score function to zero:
This is a system of equations in unknowns. The solution may require iterative optimization (Newton's method, EM, gradient ascent).
Method of Moments Estimator
Express the first population moments as functions of :
The MoM estimator solves the system:
This equates each population moment to its sample counterpart. With parameters and moment equations, you get a system of equations in unknowns.
Where Each Is Stronger
MLE wins on efficiency
The crown jewel of MLE theory is asymptotic efficiency. Under regularity conditions, the MLE achieves the Cramér-Rao lower bound as :
where is the Fisher information matrix. No consistent estimator can have smaller asymptotic variance. The MoM estimator is consistent and asymptotically normal, but its asymptotic variance is generally larger than .
MoM wins on simplicity and closed-form solutions
MoM often yields explicit closed-form estimators. For a Gamma distribution with shape and rate :
- and
- MoM: solve and
- This gives and --- fully explicit
The MLE for the Gamma distribution has no closed form and requires iterative numerical optimization (typically Newton-Raphson on the digamma function).
The Efficiency Gap
Asymptotic Efficiency of MLE
Statement
Under regularity conditions, as :
The MLE is asymptotically efficient: it achieves the Cramér-Rao lower bound for the asymptotic variance of any consistent estimator.
Intuition
The MLE extracts all information that the likelihood function contains about . The Fisher information measures how much information each observation carries, and the MLE uses all of it. MoM, by contrast, uses only the information contained in the first moments, which is generally a strict subset of the information in the full likelihood.
How large is the gap? It depends on the model:
| Distribution | MLE asymptotic variance | MoM asymptotic variance | Relative efficiency |
|---|---|---|---|
| Normal(, ) | , | Same | 100% |
| Exponential() | Same | 100% | |
| Gamma(, ) | Larger | ||
| Beta(, ) | Larger | Can be |
For the normal and exponential families, MoM and MLE coincide. For distributions where moments do not capture the full shape (e.g., heavy tails, skewness), the efficiency gap can be substantial.
Key Assumptions That Differ
| MLE | MoM | |
|---|---|---|
| What it solves | Score equations | Moment equations |
| Requires | Likelihood function in closed form | Moments as functions of |
| Closed form | Rarely | Often |
| Computational cost | Iterative optimization | Usually algebraic |
| Asymptotic efficiency | Achieves Cramér-Rao bound | Generally sub-optimal |
| Invariance | Invariant to reparameterization | Not invariant |
| Existence/uniqueness | May have multiple local maxima | Solution may not exist or be unique |
When MoM Is Actually Preferred
Estimating mixture models as a warm start
For Gaussian mixture models, the likelihood surface has many local optima. MoM (via spectral methods on the moment tensor) provides a consistent starting point that is close to the true parameters. The MLE, initialized from MoM, then refines to an efficient estimate. MoM handles the global search; MLE handles the local refinement.
Models where the likelihood is intractable
For some models (e.g., certain latent variable models, alpha-stable distributions), the likelihood function has no closed form. The density may involve intractable integrals. MoM requires only moments, which are often available in closed form even when the density is not.
Robustness to model misspecification
If the model is wrong (as it always is in practice), the MLE converges to the parameter that minimizes KL divergence to the true data distribution. MoM converges to the parameter that matches the first moments. When the model is misspecified, moment matching can be more robust because it does not try to fit the entire distribution shape, only its low-order summary statistics.
Quick estimation in the field
When you need a fast answer and computational resources are limited, MoM gives you an estimate with pencil and paper. The sample mean and variance are trivially computed, and for many models, the moment equations have explicit solutions. MLE may require writing optimization code.
Generalized Method of Moments (GMM)
When more moment conditions are available than parameters (), the system is over-determined. GMM minimizes a weighted quadratic form:
where is a vector of moment conditions and is a weighting matrix. With the optimal weight matrix , GMM achieves the semiparametric efficiency bound among moment-based estimators. GMM is the workhorse of econometrics.
Common Confusions
MoM can give impossible estimates
MoM estimates are not guaranteed to lie in the parameter space. For example, matching moments for a variance parameter can yield a negative estimate if the sample moments happen to fall in an incompatible region. MLE, by construction, always returns a parameter in the feasible set (assuming it is found via constrained optimization).
MLE efficiency requires a correctly specified model
The Cramér-Rao bound and asymptotic efficiency of MLE assume the model is correct. Under misspecification, the MLE converges to the pseudo-true parameter (KL projection), but it is no longer efficient in any meaningful sense. MoM may be more robust because it targets specific distributional features rather than the entire distribution.
Efficiency is an asymptotic property
In finite samples, the MLE can have higher bias or mean squared error than MoM, especially in small samples or near parameter boundaries. Asymptotic efficiency is a statement about the rate as . For small , compare estimators via simulation, not asymptotic theory.