GREG Estimator

Sneiderman, Robby

Statistical Foundations

GREG Estimator

The generalized regression estimator is the standard model-assisted survey estimator: Horvitz-Thompson plus a regression correction using known auxiliary totals.

AdvancedTier 2StableSupporting~45 min

Prerequisites

Linear Regression Survey Sampling Methods Design Based vs Model Based Inference

Prereq Map

Learning position

Read this page in the graph.

statistical-foundations | layer 3 | tier 2. This page has 3 direct prerequisites and 0 published dependents.

Open Atlas Prerequisites Leads to

What next

Small Area Estimation

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

The Horvitz-Thompson estimator is design-unbiased, but it can be noisy. A pure model-based regression estimator can be efficient, but it inherits model risk. The generalized regression estimator, usually shortened to GREG, is the standard compromise in survey statistics: use a regression model to gain efficiency, but keep the design-based backbone visible.

This is the canonical model-assisted estimator in official statistics. It also matters conceptually because it is the cleanest bridge between three ideas that often get taught separately:

Horvitz-Thompson design weighting
regression adjustment
calibration weighting

If your site wants strong grounding across survey methodology, this page is one of the load-bearing links.

Mental Model

Start with the Horvitz-Thompson estimate of a population total. Then ask a simple question:

We know the population total of some auxiliary variable $x$ . How much does the weighted sample miss that total, and how should that discrepancy change the estimate of $y$ ?

GREG answers by fitting a regression of $y$ on $x$ in the sample, then using the known population total of $x$ to correct the Horvitz-Thompson estimate of $y$ .

If the sample underrepresents units with large $x$ and $x$ predicts $y$ , GREG pushes the estimate upward. If the sample already matches the auxiliary totals well, the correction is small.

Formal Setup

Definition

Horvitz-Thompson Baseline

Let $U = \{1,\ldots,N\}$ be a finite population, $s$ a probability sample, and $d_i = 1 / \pi_i$ the design weight for sampled unit $i$ . The Horvitz-Thompson estimator of a population total $T_y = \sum_{i \in U} y_i$ is

$\hat{T}_{y,\mathrm{HT}} = \sum_{i \in s} d_i y_i.$

For an auxiliary vector $x_i \in \mathbb{R}^p$ whose population total $X = \sum_{i \in U} x_i$ is known, the Horvitz-Thompson estimate of that total is

$\hat{T}_{x,\mathrm{HT}} = \sum_{i \in s} d_i x_i.$

Definition

GREG Estimator

Let $\hat{\beta}$ be a weighted sample regression coefficient estimating the relationship between $y$ and $x$ . The generalized regression estimator of $T_y$ is

$\hat{T}_{y,\mathrm{GREG}} = \hat{T}_{y,\mathrm{HT}} + \left(X - \hat{T}_{x,\mathrm{HT}}\right)^\top \hat{\beta}.$

This is the Horvitz-Thompson estimator plus a regression correction using the known population discrepancy in the auxiliary totals.

Definition

Weighted Sample Regression

A standard GREG choice is

$\hat{\beta} = \left(\sum_{i \in s} d_i q_i x_i x_i^\top\right)^{-1} \left(\sum_{i \in s} d_i q_i x_i y_i\right),$

where $q_i > 0$ are optional tuning constants. When $x_i$ includes an intercept, this is a weighted least-squares fit using the sample and design weights.

Main Theorem

Theorem

GREG as Quadratic Calibration Estimator

Statement

Consider calibration weights $w_i$ obtained by minimizing

$\sum_{i \in s} \frac{(w_i - d_i)^2}{2 d_i q_i}$

subject to the calibration constraint

$\sum_{i \in s} w_i x_i = X.$

Then the solution is

$w_i = d_i\left(1 + q_i x_i^\top \lambda\right),$

where

$\lambda = \left(\sum_{i \in s} d_i q_i x_i x_i^\top\right)^{-1} \left(X - \hat{T}_{x,\mathrm{HT}}\right).$

The resulting calibrated estimator of the total,

$\hat{T}_{y,\mathrm{cal}} = \sum_{i \in s} w_i y_i,$

is exactly the GREG estimator:

$\hat{T}_{y,\mathrm{cal}} = \hat{T}_{y,\mathrm{HT}} + \left(X - \hat{T}_{x,\mathrm{HT}}\right)^\top \hat{\beta}.$

Intuition

Quadratic calibration says: change the design weights as little as possible, measured in squared distance, while forcing the weighted sample to match the known auxiliary totals. The induced estimator is not merely similar to GREG. It is GREG.

Proof Sketch

Write the Lagrangian for the quadratic objective plus the calibration constraint. Differentiating with respect to $w_i$ gives the affine form $w_i = d_i(1 + q_i x_i^\top \lambda)$ . Substituting into the constraint yields the closed-form expression for $\lambda$ . Expanding $\sum_{i \in s} w_i y_i$ then gives the Horvitz-Thompson term plus the regression correction, with $\hat{\beta} = (\sum d_i q_i x_i x_i^\top)^{-1}\sum d_i q_i x_i y_i$ .

Why It Matters

This theorem unifies the model-assisted and calibration views. GREG is not a random weighted regression trick; it is the quadratic-distance member of the calibration family. That is why it sits naturally between Horvitz-Thompson and general calibration or raking.

Failure Mode

If the weighted moment matrix is singular or ill-conditioned, the regression correction is unstable. If the auxiliary variables are weakly related to $y$ , the correction adds little and can even increase variance. If the known total $X$ is itself wrong or poorly aligned with the target population, the calibration correction moves the estimate in the wrong direction.

report a correction →

Design Properties

The exact theorem above is algebraic. The inferential appeal of GREG comes from its design properties.

Under standard regularity conditions, GREG is design-consistent for the finite-population total.
If the working linear model for $y$ on $x$ is good, GREG is typically more efficient than Horvitz-Thompson because the correction absorbs predictable structure.
If the model is wrong, GREG still keeps the design-based Horvitz-Thompson core. That is the point of model-assisted inference: use the model for efficiency, not for full justification.

This is why GREG is often the first serious answer when someone asks, "Can I use a model without giving up design-based validity?"

GREG vs. HT vs. Pure Regression

Horvitz-Thompson uses only the design. It is robust, but often noisy.
Pure regression estimator predicts the population from a model. It can be efficient, but it leans heavily on model correctness.
GREG starts from Horvitz-Thompson, then applies a model-based correction only to the discrepancy in known auxiliary totals.

That structure matters. GREG is not "just run weighted least squares." The known population total $X$ is part of the estimator itself. Without that known benchmark, you do not have GREG.

Canonical Example

Example

Correcting a payroll total with known employee counts

Suppose a business survey estimates total payroll $T_y$ . For each sampled firm, you also know employee count $x_i$ , and the total number of employees in the population is known from a register: $X = 1000$ .

The Horvitz-Thompson estimate of total payroll is 4.55 million USD. But the Horvitz-Thompson estimate of total employees from the same sample is only 920, so the weighted sample undercovers employee count by 80. A weighted sample regression gives an estimated slope of 4200 USD payroll per employee.

The GREG correction is

$\left(1000 - 920\right) \times 4200 = 336{,}000.$

So the GREG estimate becomes

$4.55 \text{ million USD} + 0.336 \text{ million USD} = 4.886 \text{ million USD}.$

The intuition is simple: the sample appears to miss employees, and payroll is strongly related to employee count, so the total payroll estimate should move upward.

When GREG Helps

GREG is most useful when three conditions line up:

the auxiliary totals are known and trusted
the auxiliary variables are strongly related to the study variable
the weighted sample misses those auxiliary totals by enough to matter

If the sample already matches the auxiliary totals well, the correction is small. If the auxiliary variables are weak predictors, the correction adds noise with little payoff.

Common Confusions

Watch Out

GREG is not just weighted least squares

Weighted regression produces $\hat{\beta}$ , but GREG is the full estimator $\hat{T}_{y,\mathrm{HT}} + (X - \hat{T}_{x,\mathrm{HT}})^\top \hat{\beta}$ . The known population total of $x$ is essential.

Watch Out

A good predictive model does not automatically imply a good GREG estimator

If the auxiliary total $X$ is wrong, the sample design is highly irregular, or the weighted regression is unstable, the correction can hurt. Prediction and finite-population estimation are related, but they are not identical problems.

Watch Out

GREG is not fully model-based

The working regression model improves efficiency, but the estimator is still judged under the sampling design. That is why GREG belongs to the model-assisted, not purely model-based, camp.

Summary

GREG is the standard model-assisted survey estimator
It equals Horvitz-Thompson plus a regression correction using known auxiliary totals
Under quadratic calibration, the calibrated estimator is exactly GREG
GREG is attractive because it keeps the design-based backbone while gaining efficiency from auxiliary information
It helps most when the auxiliary totals are trusted and the auxiliary variables predict the study variable well

Exercises

ExerciseCore

Problem

The Horvitz-Thompson estimate of a study total is 1200. The known population total of auxiliary variable $x$ is $X = 500$ , while the Horvitz-Thompson estimate of that total is 470. The fitted regression slope is $\hat{\beta} = 3$ . Compute the GREG estimate.

ExerciseAdvanced

Problem

Why can GREG reduce variance even when the weighted sample regression model is not exactly correct?

References

Canonical:

Särndal, Swensson, and Wretman, Model Assisted Survey Sampling (1992), Chapters 5-7.
Deville and Särndal, "Calibration Estimators in Survey Sampling," Journal of the American Statistical Association 87(418), 376-382 (1992). DOI
Cochran, Sampling Techniques, 3rd ed. (1977), Chapter 7 on ratio and regression estimators.

Interpretive and operational:

Statistics Canada, "Optimal calibration weights under unit nonresponse in survey sampling," Section 2.1 on calibration estimation (2019). Link
Lumley, Complex Surveys: A Guide to Analysis Using R (2010), Chapter 7 on calibration and regression estimators.
Valliant, Dever, and Kreuter, Practical Tools for Designing and Weighting Survey Samples (2018), Chapters 9-10.

Next Topics

Survey weight calibration and raking: the broader calibration family in which GREG is the quadratic-distance case
Small area estimation: model-assisted thinking pushed into thin domains and hierarchical borrowing
Official statistics and national surveys: the production setting where GREG-style estimators matter

Last reviewed: April 18, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

3

Linear Regressionlayer 1 · tier 1
Design-Based vs. Model-Based Inferencelayer 2 · tier 2
Survey Sampling Methodslayer 2 · tier 2

Derived topics

0

No published topic currently declares this as a prerequisite.