Skip to main content

Sequential Inference

e-processes

The sequential version of an e-value: a nonnegative process whose value at every stopping time is an e-value for the null. Constructed as the running product of conditional e-values adapted to a filtration; equivalently, a nonnegative supermartingale under the null. The object that powers anytime-valid inference, sequential testing, and confidence sequences.

AdvancedResearchTier 1CurrentCore spine~55 min
For:MLStatsResearch

Why This Matters

An e-value is a single nonnegative statistic with EH0[E]1\mathbb{E}_{H_0}[E] \leq 1. An e-process is the sequential version: a collection (Et)t0(E_t)_{t \geq 0} where each EtE_t is an e-value with respect to the information available at time tt, and the construction is consistent across times so that the value at any stopping time τ\tau is still an e-value. The construction lets a tester accumulate evidence, peek as often as they like, stop whenever the evidence is convincing, and retain the same Type I error guarantee they would have had at a single fixed sample size.

The technical bridge is one of the cleanest results in modern statistics. If EtE_t is a conditional e-value at time tt given the past, the running product s=1tEs\prod_{s=1}^t E_s is a nonnegative supermartingale under the null. Ville's inequality bounds the maximum of a nonnegative supermartingale: PrH0(suptEt1/α)α\Pr_{H_0}(\sup_t \prod E_t \geq 1/\alpha) \leq \alpha. The supremum is over all times, including any stopping time the analyst might choose adaptively. Optional stopping, optional continuation, and data-dependent decision rules all preserve the bound.

This is the technical content of "anytime-valid inference." Every confidence sequence, every always-valid p-value, every safe sequential test in the modern literature is a special case of the e-process construction, dating back to Wald's 1945 SPRT (likelihood-ratio e-process between two simple hypotheses) and reaching forward to LLM evaluation pipelines and continuous-monitoring clinical trials.

Formal Setup

Definition

Filtration

A nondecreasing family of σ\sigma-algebras F0F1\mathcal{F}_0 \subseteq \mathcal{F}_1 \subseteq \cdots on a probability space, with Ft\mathcal{F}_t representing the information available at time tt. In practice, Ft=σ(X1,,Xt)\mathcal{F}_t = \sigma(X_1, \ldots, X_t) for an observed sequence.

Definition

Adapted process

A stochastic process (Yt)(Y_t) is adapted to (Ft)(\mathcal{F}_t) if YtY_t is Ft\mathcal{F}_t-measurable for every tt. Equivalently, YtY_t is a function of the observations through time tt only.

Definition

e-process

A nonnegative process (Et)t0(E_t)_{t \geq 0} adapted to the filtration (Ft)(\mathcal{F}_t), with E0=1E_0 = 1, such that for every stopping time τ\tau (possibly infinite), supPH0EP[Eτ1{τ<}]1.\sup_{P \in H_0} \mathbb{E}_P[E_\tau \cdot \mathbf{1}\{\tau < \infty\}] \leq 1. A test that rejects H0H_0 when Eτ1/αE_\tau \geq 1/\alpha has Type I error at most α\alpha, by Markov's inequality applied to EτE_\tau.

Definition

Stopping time

A random variable τ:Ω{0,1,}{}\tau : \Omega \to \{0, 1, \ldots\} \cup \{\infty\} such that the event {τt}\{\tau \leq t\} is Ft\mathcal{F}_t-measurable. Intuitively, the decision to stop at time tt depends only on the information available by time tt.

Construction From Conditional e-values

The standard recipe builds an e-process from a sequence of conditional e-values. The constraint is that each new term has unit conditional expectation given the past.

Theorem

Running Product of Conditional e-values Is a Supermartingale

Statement

Let (Bt)t1(B_t)_{t \geq 1} be a sequence of nonnegative random variables adapted to (Ft)(\mathcal{F}_t) with EH0[BtFt1]1\mathbb{E}_{H_0}[B_t \mid \mathcal{F}_{t-1}] \leq 1 for every tt. Define E0=1E_0 = 1 and Et=Et1BtE_t = E_{t-1} \cdot B_t for t1t \geq 1. Then (Et)(E_t) is a nonnegative supermartingale under H0H_0 adapted to (Ft)(\mathcal{F}_t), and EH0[Et]1\mathbb{E}_{H_0}[E_t] \leq 1 for every tt.

Consequently, (Et)(E_t) is an e-process, and for every stopping time τ\tau, EH0[Eτ1{τ<}]1.\mathbb{E}_{H_0}[E_\tau \cdot \mathbf{1}\{\tau < \infty\}] \leq 1.

Intuition

Each BtB_t is a conditional bet: a payoff function with unit expected value given everything seen so far. Wealth multiplies across rounds. The conditional unit-expectation property is the martingale-like constraint that survives compounding. Equality at every step gives a martingale; an inequality gives a supermartingale, which is fine because we only need an upper bound on expected wealth.

Proof Sketch

Compute the conditional expectation of EtE_t given Ft1\mathcal{F}_{t-1}: EH0[EtFt1]=EH0[Et1BtFt1]=Et1EH0[BtFt1]Et1.\mathbb{E}_{H_0}[E_t \mid \mathcal{F}_{t-1}] = \mathbb{E}_{H_0}[E_{t-1} B_t \mid \mathcal{F}_{t-1}] = E_{t-1} \cdot \mathbb{E}_{H_0}[B_t \mid \mathcal{F}_{t-1}] \leq E_{t-1}. The factoring of Et1E_{t-1} out of the conditional expectation uses Ft1\mathcal{F}_{t-1}-measurability. The supermartingale property is the displayed inequality. The unconditional bound follows from the tower property: EH0[Et]=EH0[EH0[EtFt1]]EH0[Et1]1\mathbb{E}_{H_0}[E_t] = \mathbb{E}_{H_0}[\mathbb{E}_{H_0}[E_t \mid \mathcal{F}_{t-1}]] \leq \mathbb{E}_{H_0}[E_{t-1}] \leq \cdots \leq 1.

For the stopping-time statement, the optional stopping theorem (Doob) for nonnegative supermartingales gives EH0[Eτ1{τ<}]EH0[E0]=1\mathbb{E}_{H_0}[E_\tau \mathbf{1}\{\tau < \infty\}] \leq \mathbb{E}_{H_0}[E_0] = 1 for any stopping time τ\tau.

Why It Matters

This construction is the engine of every modern sequential test. Wald's SPRT is the special case where Bt=p1(Xt)/p0(Xt)B_t = p_1(X_t)/p_0(X_t) (the likelihood ratio of one observation), and the running product is the joint likelihood ratio. Confidence sequences are built from running products of inverted-bet e-processes. The Sequential Hypothesis Testing pipeline at large-scale A/B platforms (Optimizely, Statsig) reduces to this theorem applied to bounded-mean estimators.

Failure Mode

The bound requires the conditional unit-expectation property under the null. If the analyst chooses BtB_t based on data not in Ft1\mathcal{F}_{t-1} (e.g. by looking ahead, or by retrospectively reordering observations), the construction fails. The "adapted to Ft1\mathcal{F}_{t-1}" requirement is non-negotiable. Reusing the same observations across multiple BtB_t definitions also breaks the conditional expectation, even though the resulting process is still adapted.

Optional ProofVille's inequality for nonnegative supermartingalesShow

The single-shot Markov inequality for an e-value generalizes to a maximal inequality for the entire e-process trajectory.

Theorem

Ville's Inequality

Statement

Let (Et)t0(E_t)_{t \geq 0} be a nonnegative supermartingale under PP with E01E_0 \leq 1. Then for every c>0c > 0, P ⁣[supt0Etc]1c.P\!\left[\sup_{t \geq 0} E_t \geq c\right] \leq \frac{1}{c}. Setting c=1/αc = 1/\alpha gives the anytime-valid Type I error guarantee: under the null, the probability that the e-process ever exceeds 1/α1/\alpha is at most α\alpha.

Proof Sketch

Define τ=inf{t0:Etc}\tau = \inf\{t \geq 0 : E_t \geq c\}, a stopping time. On the event {suptEtc}\{\sup_t E_t \geq c\}, τ<\tau < \infty and EτcE_\tau \geq c. By the optional stopping theorem, E[Eτ1{τ<}]E[E0]1.\mathbb{E}[E_\tau \mathbf{1}\{\tau < \infty\}] \leq \mathbb{E}[E_0] \leq 1. On the complement, the bound is trivial. Apply Markov: P(τ<)cE[Eτ1{τ<}]1,P(\tau < \infty) \cdot c \leq \mathbb{E}[E_\tau \mathbf{1}\{\tau < \infty\}] \leq 1, so P(suptEtc)=P(τ<)1/cP(\sup_t E_t \geq c) = P(\tau < \infty) \leq 1/c.

Why It Matters

The supremum is over all times. Ville's inequality says optional stopping at any time, including the time that maximizes the e-process value, cannot inflate Type I error beyond α\alpha. This is the technical content of "anytime-valid."

Failure Mode

Ville's inequality requires the uniform supermartingale property: E[EtFt1]Et1\mathbb{E}[E_t \mid \mathcal{F}_{t-1}] \leq E_{t-1} for every tt, not just E[Et]1\mathbb{E}[E_t] \leq 1 unconditionally. Mixing e-values from different filtrations or recomputing the conditional expectation only at certain times breaks the bound.

Canonical Example: Wald's SPRT

The sequential probability ratio test (Wald 1945) is the prototype e-process. Test H0:P0H_0: P_0 against H1:P1H_1: P_1, both simple, observe X1,X2,X_1, X_2, \ldots iid. Define Et=i=1tp1(Xi)p0(Xi).E_t = \prod_{i=1}^t \frac{p_1(X_i)}{p_0(X_i)}. Each likelihood ratio is a conditional (in fact unconditional, by independence) e-value: EP0[p1(Xi)/p0(Xi)]=1\mathbb{E}_{P_0}[p_1(X_i)/p_0(X_i)] = 1. The running product is a nonnegative supermartingale under H0H_0 (actually a martingale, since the inequality is an equality).

Wald's original test stops the first time EtAE_t \geq A (reject H0H_0) or EtBE_t \leq B (accept H0H_0), with A=1/αA = 1/\alpha and B=βB = \beta for desired error rates. The Type I error guarantee follows from Ville: under H0H_0, Pr(suptEtA)1/A=α\Pr(\sup_t E_t \geq A) \leq 1/A = \alpha. The expected sample size to reject under H1H_1 scales like log(1/α)/KL(P1P0)\log(1/\alpha)/\mathrm{KL}(P_1 \| P_0), the Kullback-Leibler rate.

The same construction works for composite nulls via universal inference (Wasserman-Ramdas-Balakrishnan 2020): plug in any density estimator for P1P_1 on a held-out half of the data, evaluate the likelihood ratio on the other half. The result is an e-process for any composite null.

Worked Exercise

ExerciseAdvanced

Problem

Let X1,X2,X_1, X_2, \ldots be iid N(θ,1)\mathcal{N}(\theta, 1) and consider testing H0:θ=0H_0: \theta = 0 against H1:θ=μH_1: \theta = \mu for fixed μ>0\mu > 0. Construct the likelihood-ratio e-process explicitly. Compute its value at t=100t = 100 when Xˉ100=0.21\bar X_{100} = 0.21 and μ=0.2\mu = 0.2. At what time does the e-process exceed 1/α=201/\alpha = 20 if the true mean is exactly μ=0.2\mu = 0.2 and Xˉt0.2\bar X_t \approx 0.2 for all tt?

Practical Example: Hourly A/B Test Monitoring

A standard A/B test compares conversion rates of two website variants over a planned NN days of traffic. The product team wants to peek hourly and stop early if the result is convincing. With classical zz-test pp-values, peeking inflates Type I error. With an e-process:

  1. Define H0:conversion rate of variant Aconversion rate of variant BH_0: \text{conversion rate of variant A} \leq \text{conversion rate of variant B}.
  2. For each hour tt, the streaming users contribute conditional bets BtB_t based on the observed click-rate differential and the running variance estimate. Empirical-Bernstein constructions (Howard-Ramdas-McAuliffe-Sekhon 2021) give explicit formulas for BtB_t with bounded outcomes.
  3. The running product Et=stBsE_t = \prod_{s \leq t} B_s is the e-process. Stop the test at the first tt where Et20E_t \geq 20 (for α=0.05\alpha = 0.05) or at the budget TmaxT_{\max}, whichever comes first.

Ville's inequality guarantees that under H0H_0, the probability of falsely rejecting at any time is at most 5%5\%, regardless of how many hourly looks the analyst performed. Confidence sequences (the next topic) provide the matching interval estimates.

Connection to Classical Tests

Every classical sequential test maps onto an e-process construction:

  • Wald's SPRT: likelihood-ratio e-process with two-sided thresholds.
  • Group-sequential tests (Pocock, O'Brien-Fleming): e-process is implicit; the boundaries are tuned to the fixed schedule of looks. Adaptive peeking outside the schedule still breaks them; the e-process framing handles arbitrary stopping.
  • Always-valid p-values: defined as 1/supstEs1/\sup_{s \leq t} E_s, which inherits Ville's inequality.
  • Bayes factors: under a prior, the Bayes factor is a likelihood ratio averaged over the prior, which is again an e-value. The product form across observations gives an e-process.

The modern view (Ramdas, Grünwald, Vovk, Shafer 2023) is that all sequential tests are e-processes under different choices of betting strategy.

Implementation Note

The standard implementation pattern is to maintain the log-e-process for numerical stability:

log_E = 0.0
for x_t in stream:
    log_B_t = compute_log_conditional_bet(x_t, history)
    log_E += log_B_t
    if log_E >= np.log(1 / alpha):
        return "reject H_0", t
    history.append(x_t)

The confseq Python package (Howard et al. 2021) provides ready-made empirical-Bernstein e-processes for bounded outcomes. For Gaussian observations with unknown variance, the Robbins (1970) mixture e-process is standard; for two-sample binomial tests, the betting construction from Waudby-Smith and Ramdas (2024) gives the tightest confidence sequences in the bounded-outcome regime.

Care must be taken with the filtration. Each BtB_t must depend only on observations strictly before time tt (or, by convention, on X1,,Xt1X_1, \ldots, X_{t-1} and any external randomness independent of XtX_t). Using the current observation XtX_t inside BtB_t violates adaptation and breaks the supermartingale property.

References

Canonical:

  • Ramdas, A., Grünwald, P., Vovk, V., and Shafer, G. (2023). "Game-theoretic statistics and safe anytime-valid inference." Statistical Science 38(4), pp. 576-601. The modern survey, with Section 3 on e-processes and Section 4 on construction methods.
  • Shafer, G. and Vovk, V. (2019). Game-Theoretic Foundations for Probability and Finance (Wiley). Chapters 3-7 develop the betting/martingale framework that e-processes formalize.
  • Howard, S. R., Ramdas, A., McAuliffe, J., and Sekhon, J. (2021). "Time-uniform Chernoff bounds via nonnegative supermartingales." Probability Surveys 18, pp. 257-317. Quantitative time-uniform concentration inequalities derived from e-processes; the reference for empirical-Bernstein constructions.

Historical:

  • Wald, A. (1945). "Sequential tests of statistical hypotheses." Annals of Mathematical Statistics 16(2), pp. 117-186. The SPRT.
  • Wald, A. (1947). Sequential Analysis (Wiley). Book-length treatment of the SPRT and its operating characteristics.
  • Ville, J. (1939). Étude critique de la notion de collectif (Gauthier-Villars). The original supermartingale maximal inequality, in a foundations-of-probability context.

Current:

  • Waudby-Smith, I. and Ramdas, A. (2024). "Estimating means of bounded random variables by betting." Journal of the Royal Statistical Society, Series B 86(1), pp. 1-27. The state-of-the-art betting construction for confidence sequences.
  • Grünwald, P., de Heide, R., and Koolen, W. (2024). "Safe testing." Journal of the Royal Statistical Society, Series B. e-process construction via reverse information projection for composite nulls.

Next Topics

Last reviewed: May 13, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

6

Derived topics

3