Statistical Foundations
GREG Estimator
The generalized regression estimator is the standard model-assisted survey estimator: Horvitz-Thompson plus a regression correction using known auxiliary totals.
Why This Matters
The Horvitz-Thompson estimator is design-unbiased, but it can be noisy. A pure model-based regression estimator can be efficient, but it inherits model risk. The generalized regression estimator, usually shortened to GREG, is the standard compromise in survey statistics: use a regression model to gain efficiency, but keep the design-based backbone visible.
This is the canonical model-assisted estimator in official statistics. It also matters conceptually because it is the cleanest bridge between three ideas that often get taught separately:
- Horvitz-Thompson design weighting
- regression adjustment
- calibration weighting
If your site wants strong grounding across survey methodology, this page is one of the load-bearing links.
Mental Model
Start with the Horvitz-Thompson estimate of a population total. Then ask a simple question:
We know the population total of some auxiliary variable . How much does the weighted sample miss that total, and how should that discrepancy change the estimate of ?
GREG answers by fitting a regression of on in the sample, then using the known population total of to correct the Horvitz-Thompson estimate of .
If the sample underrepresents units with large and predicts , GREG pushes the estimate upward. If the sample already matches the auxiliary totals well, the correction is small.
Formal Setup
Horvitz-Thompson Baseline
Let be a finite population, a probability sample, and the design weight for sampled unit . The Horvitz-Thompson estimator of a population total is
For an auxiliary vector whose population total is known, the Horvitz-Thompson estimate of that total is
GREG Estimator
Let be a weighted sample regression coefficient estimating the relationship between and . The generalized regression estimator of is
This is the Horvitz-Thompson estimator plus a regression correction using the known population discrepancy in the auxiliary totals.
Weighted Sample Regression
A standard GREG choice is
where are optional tuning constants. When includes an intercept, this is a weighted least-squares fit using the sample and design weights.
Main Theorem
GREG as Quadratic Calibration Estimator
Statement
Consider calibration weights obtained by minimizing
subject to the calibration constraint
Then the solution is
where
The resulting calibrated estimator of the total,
is exactly the GREG estimator:
Intuition
Quadratic calibration says: change the design weights as little as possible, measured in squared distance, while forcing the weighted sample to match the known auxiliary totals. The induced estimator is not merely similar to GREG. It is GREG.
Proof Sketch
Write the Lagrangian for the quadratic objective plus the calibration constraint. Differentiating with respect to gives the affine form . Substituting into the constraint yields the closed-form expression for . Expanding then gives the Horvitz-Thompson term plus the regression correction, with .
Why It Matters
This theorem unifies the model-assisted and calibration views. GREG is not a random weighted regression trick; it is the quadratic-distance member of the calibration family. That is why it sits naturally between Horvitz-Thompson and general calibration or raking.
Failure Mode
If the weighted moment matrix is singular or ill-conditioned, the regression correction is unstable. If the auxiliary variables are weakly related to , the correction adds little and can even increase variance. If the known total is itself wrong or poorly aligned with the target population, the calibration correction moves the estimate in the wrong direction.
Design Properties
The exact theorem above is algebraic. The inferential appeal of GREG comes from its design properties.
- Under standard regularity conditions, GREG is design-consistent for the finite-population total.
- If the working linear model for on is good, GREG is typically more efficient than Horvitz-Thompson because the correction absorbs predictable structure.
- If the model is wrong, GREG still keeps the design-based Horvitz-Thompson core. That is the point of model-assisted inference: use the model for efficiency, not for full justification.
This is why GREG is often the first serious answer when someone asks, "Can I use a model without giving up design-based validity?"
GREG vs. HT vs. Pure Regression
- Horvitz-Thompson uses only the design. It is robust, but often noisy.
- Pure regression estimator predicts the population from a model. It can be efficient, but it leans heavily on model correctness.
- GREG starts from Horvitz-Thompson, then applies a model-based correction only to the discrepancy in known auxiliary totals.
That structure matters. GREG is not "just run weighted least squares." The known population total is part of the estimator itself. Without that known benchmark, you do not have GREG.
Canonical Example
Correcting a payroll total with known employee counts
Suppose a business survey estimates total payroll . For each sampled firm, you also know employee count , and the total number of employees in the population is known from a register: .
The Horvitz-Thompson estimate of total payroll is 4.55 million USD. But the Horvitz-Thompson estimate of total employees from the same sample is only 920, so the weighted sample undercovers employee count by 80. A weighted sample regression gives an estimated slope of 4200 USD payroll per employee.
The GREG correction is
So the GREG estimate becomes
The intuition is simple: the sample appears to miss employees, and payroll is strongly related to employee count, so the total payroll estimate should move upward.
When GREG Helps
GREG is most useful when three conditions line up:
- the auxiliary totals are known and trusted
- the auxiliary variables are strongly related to the study variable
- the weighted sample misses those auxiliary totals by enough to matter
If the sample already matches the auxiliary totals well, the correction is small. If the auxiliary variables are weak predictors, the correction adds noise with little payoff.
Common Confusions
GREG is not just weighted least squares
Weighted regression produces , but GREG is the full estimator . The known population total of is essential.
A good predictive model does not automatically imply a good GREG estimator
If the auxiliary total is wrong, the sample design is highly irregular, or the weighted regression is unstable, the correction can hurt. Prediction and finite-population estimation are related, but they are not identical problems.
GREG is not fully model-based
The working regression model improves efficiency, but the estimator is still judged under the sampling design. That is why GREG belongs to the model-assisted, not purely model-based, camp.
Summary
- GREG is the standard model-assisted survey estimator
- It equals Horvitz-Thompson plus a regression correction using known auxiliary totals
- Under quadratic calibration, the calibrated estimator is exactly GREG
- GREG is attractive because it keeps the design-based backbone while gaining efficiency from auxiliary information
- It helps most when the auxiliary totals are trusted and the auxiliary variables predict the study variable well
Exercises
Problem
The Horvitz-Thompson estimate of a study total is 1200. The known population total of auxiliary variable is , while the Horvitz-Thompson estimate of that total is 470. The fitted regression slope is . Compute the GREG estimate.
Problem
Why can GREG reduce variance even when the weighted sample regression model is not exactly correct?
References
Canonical:
- Särndal, Swensson, and Wretman, Model Assisted Survey Sampling (1992), Chapters 5-7.
- Deville and Särndal, "Calibration Estimators in Survey Sampling," Journal of the American Statistical Association 87(418), 376-382 (1992). DOI
- Cochran, Sampling Techniques, 3rd ed. (1977), Chapter 7 on ratio and regression estimators.
Interpretive and operational:
- Statistics Canada, "Optimal calibration weights under unit nonresponse in survey sampling," Section 2.1 on calibration estimation (2019). Link
- Lumley, Complex Surveys: A Guide to Analysis Using R (2010), Chapter 7 on calibration and regression estimators.
- Valliant, Dever, and Kreuter, Practical Tools for Designing and Weighting Survey Samples (2018), Chapters 9-10.
Next Topics
- Survey weight calibration and raking: the broader calibration family in which GREG is the quadratic-distance case
- Small area estimation: model-assisted thinking pushed into thin domains and hierarchical borrowing
- Official statistics and national surveys: the production setting where GREG-style estimators matter
Last reviewed: April 18, 2026
Prerequisites
Foundations this topic depends on.
- Linear RegressionLayer 1
- Matrix Operations and PropertiesLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Maximum Likelihood EstimationLayer 0B
- Common Probability DistributionsLayer 0A
- Differentiation in RnLayer 0A
- Survey Sampling MethodsLayer 2
- Expectation, Variance, Covariance, and MomentsLayer 0A
- Design-Based vs. Model-Based InferenceLayer 2