Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Comparison

Martingale CLT vs. Classical CLT

The classical CLT requires iid random variables. The martingale CLT extends to dependent sequences with mean-zero increments. The martingale version is needed for stochastic approximation and online learning.

What Each States

Both theorems say that a normalized sum converges in distribution to a Gaussian. They differ in what conditions the summands must satisfy.

Classical CLT: the summands are independent and identically distributed. No dependence is allowed.

Martingale CLT: the summands form a martingale difference sequence. They can be dependent, but each increment must have conditional mean zero given the past.

Side-by-Side Statement

Definition

Classical CLT (Lindeberg-Levy)

Let X1,X2,X_1, X_2, \ldots be i.i.d. with E[Xi]=μ\mathbb{E}[X_i] = \mu and Var(Xi)=σ2<\text{Var}(X_i) = \sigma^2 < \infty. Then:

n(Xˉnμ)σdN(0,1)\frac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} \xrightarrow{d} \mathcal{N}(0, 1)

where Xˉn=1ni=1nXi\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i.

Definition

Martingale CLT

Let {Dn,Fn}\{D_n, \mathcal{F}_n\} be a martingale difference sequence: E[DnFn1]=0\mathbb{E}[D_n | \mathcal{F}_{n-1}] = 0 for all nn. Let σn2=E[Dn2Fn1]\sigma_n^2 = \mathbb{E}[D_n^2 | \mathcal{F}_{n-1}] be the conditional variance. Define Vn2=i=1nσi2V_n^2 = \sum_{i=1}^n \sigma_i^2. If:

  1. Vn2/vn2p1V_n^2 / v_n^2 \xrightarrow{p} 1 for some deterministic sequence vn2v_n^2 \to \infty
  2. The Lindeberg condition holds: for all ϵ>0\epsilon > 0, 1vn2i=1nE[Di21(Di>ϵvn)Fi1]p0\frac{1}{v_n^2}\sum_{i=1}^n \mathbb{E}[D_i^2 \mathbf{1}(|D_i| > \epsilon v_n) | \mathcal{F}_{i-1}] \xrightarrow{p} 0

Then:

1vni=1nDidN(0,1)\frac{1}{v_n}\sum_{i=1}^n D_i \xrightarrow{d} \mathcal{N}(0, 1)

Where Each Is Stronger

Classical CLT is simpler to verify

You check two things: are the variables i.i.d.? Is the variance finite? If yes to both, the CLT holds. No filtration, no conditional variance, no Lindeberg condition. For most basic statistical applications (sample means, confidence intervals), the classical CLT suffices.

Martingale CLT handles dependent data

In any setting where the data is generated adaptively (the distribution of XnX_n depends on X1,,Xn1X_1, \ldots, X_{n-1}), the classical CLT does not apply. The martingale CLT covers these cases, which include most of the interesting settings in modern ML.

Where Each Fails

Classical CLT fails for adaptive/sequential data

In stochastic gradient descent, the gradient at step nn depends on the parameter θn\theta_n, which depends on all previous gradients. The gradient noise terms are not independent. In multi-armed bandits, the reward at time nn depends on the arm selection, which depends on all previous rewards. In either case, the classical CLT does not apply.

Martingale CLT fails without mean-zero increments

The martingale difference condition E[DnFn1]=0\mathbb{E}[D_n | \mathcal{F}_{n-1}] = 0 is restrictive. Not all dependent sequences satisfy it. For example, if observations have a non-zero conditional mean that depends on the past (e.g., a stationary ergodic process with positive autocorrelation), you need different tools (mixing conditions, or the ergodic theorem combined with a CLT for mixing sequences).

Both fail for heavy-tailed distributions

Both CLTs require finite variance (σ2<\sigma^2 < \infty or the conditional variance condition). For heavy-tailed distributions where the variance is infinite (e.g., Cauchy, stable distributions with α<2\alpha < 2), the normalized sum converges to a stable distribution, not a Gaussian.

Key Assumptions That Differ

Classical CLTMartingale CLT
IndependenceRequired (i.i.d.)Not required
Identical distributionRequiredNot required
Mean-zero conditionμ\mu is subtracted$\mathbb[D_n
Variance conditionσ2<\sigma^2 < \inftyConditional variance stabilizes
Lindeberg conditionAutomatic for i.i.d.Must be verified
Applies to SGD noiseNoYes
Applies to bandit regretNoYes

When a Researcher Would Use Each

Example

Confidence interval for a population mean

Given nn i.i.d. samples from a distribution with unknown mean μ\mu and finite variance σ2\sigma^2, the classical CLT gives XˉnN(μ,σ2/n)\bar{X}_n \approx \mathcal{N}(\mu, \sigma^2/n) for large nn. This yields the standard confidence interval Xˉn±zα/2σ^/n\bar{X}_n \pm z_{\alpha/2} \cdot \hat{\sigma}/\sqrt{n}. The classical CLT is the right tool here.

Example

Asymptotic normality of SGD iterates

In SGD, θn+1=θnηngn\theta_{n+1} = \theta_n - \eta_n g_n where gn=f(θn;Xn)g_n = \nabla f(\theta_n; X_n) is the stochastic gradient. The noise gnF(θn)g_n - \nabla F(\theta_n) is a martingale difference (conditional mean zero given Fn\mathcal{F}_n, the history up to step nn), but it is not independent of the past (since θn\theta_n depends on all previous noise terms). The martingale CLT is needed to establish that n(θnθ)\sqrt{n}(\theta_n - \theta^*) converges to a Gaussian, which is the foundation of statistical inference for SGD.

Example

Regret analysis in bandits

In a multi-armed bandit, the reward RtR_t at time tt depends on the arm chosen, which depends on all previous rewards. The centered reward RtμAtR_t - \mu_{A_t} (where AtA_t is the chosen arm and μAt\mu_{A_t} is its mean) forms a martingale difference sequence. The martingale CLT gives the asymptotic distribution of the cumulative regret, enabling construction of confidence intervals for the regret.

Common Confusions

Watch Out

Martingale CLT does not require stationarity

The classical CLT uses identical distributions (stationarity). The martingale CLT allows the conditional variance σn2\sigma_n^2 to change over time. The key requirement is that the sum of conditional variances Vn2V_n^2 grows at a predictable rate. This flexibility is what makes it applicable to non-stationary settings like SGD with decaying learning rates.

Watch Out

The Lindeberg condition is not always easy to check

For the classical CLT with i.i.d. variables, the Lindeberg condition is automatic (it follows from finite variance). For martingale differences, verifying the Lindeberg condition requires bounding the conditional probability of large increments. In practice, if the martingale differences are uniformly bounded (DnM|D_n| \leq M), the Lindeberg condition holds trivially. For unbounded increments, it requires more work.

Watch Out

The martingale CLT is not a strict generalization of the classical CLT

While every i.i.d. mean-zero sequence is a martingale difference sequence, the conditions of the martingale CLT (stabilization of conditional variance, Lindeberg condition) are not automatically implied by the i.i.d. assumption in exactly the same way. In practice, for i.i.d. sequences, the classical CLT is simpler and sharper. The martingale CLT is more general, but at the cost of more conditions to verify.

What to Memorize

  1. Classical CLT: i.i.d., finite variance, n(Xˉnμ)/σN(0,1)\sqrt{n}(\bar{X}_n - \mu)/\sigma \to \mathcal{N}(0,1).
  2. Martingale CLT: martingale difference sequence, conditional variance stabilizes, Lindeberg condition, same Gaussian limit.
  3. When independence fails: adaptive algorithms (SGD, bandits, online learning) need the martingale CLT.
  4. The key structural requirement of the martingale CLT: conditional mean zero given the past. This is the "unbiased noise" condition.
  5. Practical check: if increments are uniformly bounded, the Lindeberg condition is free. Focus on verifying the conditional variance condition.