Statistical Estimation
Method of Moments
Match sample moments to population moments to estimate parameters. Simpler than MLE but less efficient. Covers classical MoM, generalized method of moments (GMM), and when MoM is the better choice.
Prerequisites
Why This Matters
The method of moments (MoM) is the oldest systematic approach to parameter estimation. Before MLE, before Bayesian methods, there was moment matching. You still encounter it constantly:
- Initializing EM: Gaussian mixture models often use MoM to get a reasonable starting point before running EM
- Latent variable models: when the likelihood is intractable, MoM (or its generalization, GMM) can bypass the likelihood entirely
- Econometrics: GMM is the workhorse estimator when you have moment conditions but no full likelihood specification
MoM is also the simplest estimator to derive and understand. It builds intuition for what estimation is doing before you tackle the more sophisticated MLE theory.
Mental Model
You know that the population mean is , the population variance is , and so on. You compute the sample mean, sample variance, etc. Then you solve the equations that set sample moments equal to population moments. The solutions are your parameter estimates.
It is algebra, not optimization.
Formal Setup and Notation
Let be i.i.d. from a distribution with parameter . Define:
Population Moments
The -th population moment is:
More generally, a population moment can be any known function of : for some function .
Sample Moments
The -th sample moment is:
By the law of large numbers, as .
Method of Moments Estimator
If , the method of moments estimator solves the system of equations:
That is, set the first sample moments equal to the corresponding population moments and solve for .
Main Theorems
Consistency of Method of Moments
Statement
If the function is continuous and has a continuous inverse in a neighborhood of , then the method of moments estimator is consistent:
Intuition
The sample moments converge to the population moments by the law of large numbers. If the map from parameters to moments is invertible and continuous, then inverting converging inputs gives converging outputs. This is just the continuous mapping theorem applied to moment matching.
Proof Sketch
By the law of large numbers, for each . The estimator is . Since is continuous, the continuous mapping theorem gives .
Why It Matters
MoM consistency is free: it only requires the law of large numbers and a smooth relationship between parameters and moments. You do not need to verify regularity conditions on the likelihood. This makes MoM applicable to models where MLE is difficult or undefined.
Failure Mode
If the mapping is not invertible (different parameters give the same moments), the MoM estimator is not identified. This happens in some mixture models where higher-order moments are needed for identification. Also, MoM can produce estimates outside the parameter space (e.g., a negative variance estimate).
Asymptotic Distribution
The MoM estimator is asymptotically normal. By the CLT, the vector of sample moments is asymptotically Gaussian. The delta method then gives:
where and is the Jacobian of the inverse moment map.
In general, . The MoM estimator is typically less efficient than MLE: it has larger asymptotic variance. The MLE achieves the Cramér-Rao bound; MoM usually does not.
Generalized Method of Moments (GMM)
Generalized Method of Moments
When you have more moment conditions than parameters (), the system is over-determined. GMM finds the parameter that minimizes a weighted norm of the moment violations:
where is a vector of moment conditions satisfying , and is a positive definite weight matrix.
The optimal weight matrix is , which minimizes the asymptotic variance of . In practice, this is estimated in a two-step procedure: first estimate with , then re-estimate with the estimated optimal .
When MoM Is Preferred Over MLE
MoM is not just a historical curiosity. There are real situations where it is the better choice:
- Closed-form solutions: MoM often gives explicit formulas where MLE requires numerical optimization (e.g., fitting a Gamma distribution)
- Computational simplicity: for initial parameter estimates or large-scale problems where likelihood evaluation is expensive
- Robustness to misspecification: MoM only requires certain moment conditions to hold, not the full distributional model. If the model is wrong but the moments are right, MoM remains consistent
- Intractable likelihoods: in latent variable models, the likelihood may involve intractable integrals, but moments may be computable
Canonical Examples
MoM for Gaussian parameters
For , the first two population moments are and .
Setting sample moments equal: and .
The MoM estimator of the mean is the sample mean (same as MLE). The MoM estimator of variance divides by (same as MLE, but biased).
MoM for Uniform(0, theta)
For , . Setting gives .
Compare with MLE: . The MLE is more efficient (converges at rate vs for MoM). This is an example where MoM is much worse than MLE.
Common Confusions
MoM is not always less efficient than MLE
For exponential families, the MLE and MoM often coincide (both use the sufficient statistic). The efficiency gap only appears for models where sufficient statistics are higher-dimensional than the parameter or where the moment map is far from optimal. For the Gaussian mean, MoM = MLE.
MoM can give impossible estimates
Nothing prevents MoM from returning a negative variance estimate or a probability outside . Unlike MLE (which respects the model structure), MoM inverts algebraic equations without constraints. In practice, you clip or project the estimate to the valid parameter space.
Summary
- MoM: set sample moments equal to population moments and solve for
- Consistency follows from the law of large numbers plus the continuous mapping theorem
- MoM is asymptotically normal but typically less efficient than MLE
- GMM handles over-identified models (more moment conditions than parameters) by minimizing a weighted quadratic form
- MoM is preferred when: closed-form solutions exist, likelihoods are intractable, or robustness to misspecification is needed
- MoM can produce out-of-range estimates; MLE respects model constraints
Exercises
Problem
Compute the method of moments estimator for the Poisson distribution: . Compare it with the MLE.
Problem
For the Gamma distribution with shape and rate (so and ), derive the MoM estimators for and . Show that the MoM estimator of can be negative for certain samples.
References
Canonical:
- Casella & Berger, Statistical Inference (2nd ed., 2002), Chapter 7
- van der Vaart, Asymptotic Statistics (1998), Chapter 5
- Hansen, "Large Sample Properties of Generalized Method of Moments Estimators" (1982)
Current:
-
Wasserman, All of Statistics (2004), Chapter 9
-
Hall, Generalized Method of Moments (2005)
-
Lehmann & Casella, Theory of Point Estimation (1998), Chapters 1-6
Next Topics
Building on the method of moments:
- Maximum likelihood estimation: the more efficient alternative, and when the two coincide
- Hypothesis testing for ML: using moment conditions for test construction
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Common Probability DistributionsLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A