Predictive Uncertainty

Weighted Conformal Prediction Under Covariate Shift

The extension of split conformal prediction that restores finite-sample coverage when the test distribution differs from the training distribution, at the cost of knowing or estimating the likelihood ratio between them.

ResearchTier 1Current~50 min

Prerequisites

Split Conformal Prediction Radon Nikodym and Conditional Expectation Importance Sampling Causal Inference Basics

Prereq Map

Why This Matters

Split conformal prediction rests on exchangeability. Exchangeability fails the moment the test distribution differs from the training distribution, which happens automatically when models are trained six months before deployment or evaluated on a different population than they were fitted on. Under covariate shift the nominal $1 - \alpha$ coverage silently degrades, often substantially, and the usual diagnostic (hold out a clean test set) is unavailable because the clean test set is itself drawn from the training distribution.

Weighted conformal prediction recovers validity by reweighting the calibration points using the likelihood ratio between the two distributions. The construction is due to Tibshirani, Barber, Candès, Ramdas (2019) and is the single most-cited extension of split conformal. The same machinery turns out to give prediction intervals for individual treatment effects, connecting conformal prediction to the causal-inference literature through propensity scores.

Formal Setup

Training data $(X_i, Y_i)_{i=1}^{n}$ are drawn from a joint distribution with feature marginal $P_X$ . The test point $(X_{n+1}, Y_{n+1})$ has feature marginal $Q_X$ , possibly different from $P_X$ , but the conditional law $Y \mid X$ is assumed the same across both distributions. This is the covariate shift setting; label shift and concept drift are handled separately and are strictly harder.

Define the likelihood ratio

$w(x) = \frac{\mathrm{d} Q_X}{\mathrm{d} P_X}(x),$

the Radon-Nikodym derivative of $Q_X$ with respect to $P_X$ . We assume $Q_X \ll P_X$ so the ratio exists; where this fails the problem is ill-posed and no distribution-free correction is possible.

Why Split Conformal Fails

Under exchangeability the test score's rank among the calibration scores is uniform on $\{1, \ldots, n+1\}$ . Under covariate shift this uniformity breaks: test points that are more likely under $Q_X$ than under $P_X$ are overrepresented at the test side of the score distribution, so the test score tends to fall at higher ranks than the calibration scores. Split conformal, which assumes uniform rank, undercovers on the tails where the shift is largest.

Weighted Exchangeability

Definition

Weighted Exchangeability

Random variables $Z_1, \ldots, Z_{n+1}$ are weighted exchangeable with weight function $w$ if the joint density factors as

$f(z_1, \ldots, z_{n+1}) = \frac{1}{Z_n} \prod_{i=1}^{n} g(z_i) \cdot w(z_{n+1}) g(z_{n+1}),$

for some base density $g$ and normalizing constant $Z_n$ . Equivalently, the first $n$ points are i.i.d.\ from $P$ and the last point is drawn from $Q$ with density $w \cdot g$ .

The key observation is that weighted exchangeability preserves a weighted version of rank uniformity, which is exactly what we need for a coverage argument.

The Weighted Quantile Construction

Compute calibration scores $s_i = s(X_i, Y_i)$ as before. Define normalized weights on the calibration and test points:

$p_i(X_{n+1}) = \frac{w(X_i)}{\sum_{j=1}^{n} w(X_j) + w(X_{n+1})}, \qquad p_{n+1}(X_{n+1}) = \frac{w(X_{n+1})}{\sum_{j=1}^{n} w(X_j) + w(X_{n+1})}.$

The test-point weight appearing in the denominator is the subtle point most implementations get wrong. The prediction set is

$\hat{C}(X_{n+1}) = \bigl\{y : s(X_{n+1}, y) \leq \hat{q}_w(X_{n+1})\bigr\},$

where $\hat{q}_w(X_{n+1})$ is the weighted $\lceil (1-\alpha) \rceil$ -quantile of the distribution that puts mass $p_i(X_{n+1})$ on score $s_i$ and mass $p_{n+1}(X_{n+1})$ on $+\infty$ (so the test point always has a chance to fail).

Main Theorem

Theorem

Weighted Conformal Coverage

Statement

Let $\hat{C}(X_{n+1})$ be the weighted split conformal prediction set constructed above. Then under the joint law where calibration points are drawn from $P$ and the test point from $Q$ ,

$\mathbb{P}_{Q}\bigl(Y_{n+1} \in \hat{C}(X_{n+1})\bigr) \geq 1 - \alpha.$

Intuition

Reweighting by $w$ converts the non-exchangeable $(P, P, \ldots, P, Q)$ setup into a weighted-exchangeable one. The weighted-exchangeable rank of the test score is uniform on a probability-weighted index set, and the weighted quantile catches it at the usual level. The proof is a reduction to ordinary exchangeability on an enlarged probability space.

Proof Sketch

Condition on the unordered collection of scores $\{s_1, \ldots, s_{n+1}\}$ . Under weighted exchangeability the assignment of score values to indices has probability proportional to the product of weights along each permutation. The rank of $s_{n+1}$ is then distributed according to the $p_i$ weights, and the weighted quantile cutoff produces the stated coverage. Full details in Tibshirani, Barber, Candès, Ramdas (2019), Section 3.

Why It Matters

The guarantee is finite-sample, distribution-free over the conditional $Y \mid X$ , and requires no assumption on the function class of the predictor or the dimension of the features. The only assumption beyond split conformal is access to $w$ , which in causal settings is the propensity score and in deployment settings can be estimated by density ratio methods.

Failure Mode

If $w$ is unknown and replaced by an estimate $\hat{w}$ , coverage bias is first-order in $\|\hat{w} - w\|$ . Standard plug-in is unsafe; doubly robust variants (next section) reduce this to a product of errors.

Doubly Robust Weighted Conformal

When $w$ must be estimated, a plug-in procedure inherits first-order bias from the weight estimator. Treating covariate shift as a missing-data problem gives a cleaner path: fit a quantile regression $\hat{q}_Y(x)$ for $Y \mid X$ on the training distribution, fit a weight model $\hat{w}(x)$ , and combine them in an AIPW-style construction of the conformal score. Coverage bias then becomes the product of the two estimation errors, analogous to the double-robustness property in double/debiased machine learning.

Connection to Causal Inference

Individual treatment effect (ITE) prediction is covariate shift in disguise. Under unconfoundedness, the conditional distribution of $Y(1) \mid X$ in the treated subpopulation equals the conditional distribution under the counterfactual "treat everyone" population. The likelihood ratio between these is the inverse-propensity weight $1 / e(X)$ . Weighted conformal prediction with this weight produces finite-sample valid prediction intervals for $Y(1)(X_{n+1}) - Y(0)(X_{n+1})$ , provided unconfoundedness holds. This links the conformal literature directly to semiparametric causal inference.

Sensitivity Analysis Under Unmeasured Confounding

Unconfoundedness is a strong assumption. Rosenbaum's marginal sensitivity model parameterizes hidden confounding by $\Gamma \geq 1$ , interpreted as the worst-case ratio between the true and observed propensity odds. Under a $\Gamma$ -bounded violation, Jin, Ren, Candès (2023) give robust weighted conformal intervals with coverage guarantees that degrade gracefully with $\Gamma$ . This provides an honest answer to "what if unconfoundedness fails by a factor of $\Gamma$ ?"

Beyond Covariate Shift

The nonexchangeable framework of Barber, Candès, Ramdas, Tibshirani (2023) generalizes weighted conformal to arbitrary user-chosen weights with coverage guarantees stated as a function of the total variation distance between the assumed and true weighting. This opens the door to time series, spatial data, and label shift, each at the cost of a coverage gap that reflects the modelling mismatch.

Exercises

ExerciseCore

Problem

Three calibration points have features $X_i \in \mathbb{R}$ , scores $s_1 = 1, s_2 = 2, s_3 = 3$ , and known likelihood ratios $w(X_1) = 1, w(X_2) = 2, w(X_3) = 1$ . A test point has $w(X_4) = 2$ . Compute the weighted $0.75$ -quantile used by weighted conformal at $\alpha = 0.25$ .

ExerciseAdvanced

Problem

Show that when $w \equiv 1$ (no shift) the weighted conformal construction reduces exactly to the split conformal construction of the previous page, including the ceiling quantile level. Identify where the $+1$ in the denominator of split conformal comes from in the weighted derivation.

ExerciseResearch

Problem

State the Lei-Candès (2021) construction for ITE prediction intervals using weighted conformal with estimated propensity. Identify what happens to coverage when the propensity estimator converges at rate slower than $n^{-1/4}$ .

Open Problems and Frontier

Weighted conformal under label shift rather than covariate shift is harder because $Y$ is unobserved at test time; current work uses label-conditional scores or EM-style re-estimation.

Combining weighted conformal with anytime-valid inference (Koning, van Meer 2025 style) opens prediction sets that remain valid under sequential peeking and distribution drift simultaneously; the distribution-shift version is not yet fully written down.

Tight lower bounds on coverage loss as a function of density-ratio estimation error. Current guarantees are loose and there is room for sharp finite-sample rates.

Conditional coverage under covariate shift inherits the impossibility result from the split conformal page (Barber, Candès, Ramdas, Tibshirani 2021). Partial guarantees under smoothness or localizability are the current line of attack.

High-dimensional density-ratio estimation is the practical bottleneck. Methods based on discriminative classifiers, score matching, and kernel-based estimators trade off sample complexity against bias in ways that matter for the coverage product-rate.

References

Canonical:

Tibshirani, Barber, Candès, Ramdas, "Conformal Prediction Under Covariate Shift." NeurIPS 2019. The founding paper.
Lei, Candès, "Conformal Inference of Counterfactuals and Individual Treatment Effects." Journal of the Royal Statistical Society B 83(5) (2021), 911-938.
Barber, Candès, Ramdas, Tibshirani, "Conformal Prediction Beyond Exchangeability." Annals of Statistics 51(2) (2023), 816-845.

Robustness and sensitivity:

Jin, Ren, Candès, "Sensitivity Analysis of Individual Treatment Effects: A Robust Conformal Inference Approach." Proceedings of the National Academy of Sciences 120(6) (2023).
Yang, Kim, Tchetgen Tchetgen, "Doubly Robust Calibration of Prediction Sets Under Covariate Shift." Journal of the Royal Statistical Society B (to appear; arXiv:2203.01761).

Density-ratio estimation:

Sugiyama, Suzuki, Kanamori, Density Ratio Estimation in Machine Learning (Cambridge University Press, 2012). Chapters 3-5.

Background:

Angelopoulos, Bates, "A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification." Foundations and Trends in Machine Learning 16(4) (2023). Chapter 5.

Next Topics

Double/debiased machine learning: the causal-inference companion; shares nuisance estimation problems.
E-values and anytime-valid inference: sequential validity under peeking.
Calibration and uncertainty: Platt, isotonic, temperature scaling and why conformal is distinct.

Last reviewed: April 24, 2026

Prerequisites

Foundations this topic depends on.

Split Conformal PredictionLayer 2
Order StatisticsLayer 1
Common Probability DistributionsLayer 0A
Sets, Functions, and RelationsLayer 0A
Basic Logic and Proof TechniquesLayer 0A
Hypothesis Testing for MLLayer 2
Cross-Validation TheoryLayer 2
Empirical Risk MinimizationLayer 2
Concentration InequalitiesLayer 1
Expectation, Variance, Covariance, and MomentsLayer 0A
Random VariablesLayer 0A
Kolmogorov Probability AxiomsLayer 0A
Bias-Variance TradeoffLayer 2
Radon-Nikodym and Conditional ExpectationLayer 0B
Measure-Theoretic ProbabilityLayer 0B
Importance SamplingLayer 2
Causal Inference BasicsLayer 3

Next Topics

Double Debiased Machine LearningContinue →E Values and Anytime Valid InferenceContinue →