Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Mathematical Infrastructure

Ito's Lemma

The chain rule of stochastic calculus: if X_t follows an SDE, then f(X_t) follows a modified SDE with an extra second-order correction term that has no analogue in ordinary calculus.

AdvancedTier 2Stable~50 min
0

Why This Matters

In ordinary calculus, if you know dx/dtdx/dt and want df(x)/dtdf(x)/dt, you apply the chain rule: f(x)dx/dtf'(x) \cdot dx/dt. In stochastic calculus, this formula is wrong. The chain rule picks up an extra term proportional to f(x)σ2f''(x) \sigma^2. This correction term is the reason stochastic calculus exists as a separate subject.

Every derivation in diffusion models, score-based generative models, and mathematical finance uses Ito's lemma. If you cannot apply it mechanically, you cannot read these papers.

Mental Model

A Brownian motion path is so rough that (dWt)2(dW_t)^2 does not vanish. It equals dtdt in a precise sense (quadratic variation). When you Taylor expand f(Xt)f(X_t), the second-order term 12f(Xt)(dXt)2\frac{1}{2} f''(X_t)(dX_t)^2 survives because (dXt)2(dX_t)^2 contains a σ2dt\sigma^2 dt piece. In ordinary calculus, (dx)2=0(dx)^2 = 0. In stochastic calculus, (dW)2=dt(dW)^2 = dt.

Setup

Let XtX_t be an Ito process satisfying the SDE:

dXt=μ(Xt,t)dt+σ(Xt,t)dWtdX_t = \mu(X_t, t)\,dt + \sigma(X_t, t)\,dW_t

where WtW_t is standard Brownian motion, μ\mu is the drift, and σ\sigma is the diffusion coefficient.

Definition

Quadratic Variation Rule

The multiplication rules for Ito calculus are:

dtdt=0,dtdWt=0,dWtdWt=dtdt \cdot dt = 0, \quad dt \cdot dW_t = 0, \quad dW_t \cdot dW_t = dt

These rules follow from the quadratic variation of Brownian motion: Wt=t\langle W \rangle_t = t.

Main Theorems

Theorem

Ito's Lemma (One Dimension)

Statement

Let f(x,t)C2,1f(x, t) \in C^{2,1}. Then Yt=f(Xt,t)Y_t = f(X_t, t) satisfies:

df=(ft+μfx+σ222fx2)dt+σfxdWtdf = \left(\frac{\partial f}{\partial t} + \mu \frac{\partial f}{\partial x} + \frac{\sigma^2}{2} \frac{\partial^2 f}{\partial x^2}\right) dt + \sigma \frac{\partial f}{\partial x}\, dW_t

The term σ222fx2dt\frac{\sigma^2}{2} \frac{\partial^2 f}{\partial x^2}\,dt is the Ito correction. It has no analogue in ordinary calculus.

Intuition

Taylor expand ff to second order. The first-order terms give the ordinary chain rule. The second-order term 12f(dX)2\frac{1}{2} f''(dX)^2 normally vanishes, but here (dX)2=σ2(dW)2+higher order=σ2dt(dX)^2 = \sigma^2 (dW)^2 + \text{higher order} = \sigma^2 dt. So 12fσ2dt\frac{1}{2} f'' \sigma^2 dt survives.

Proof Sketch

Partition [0,t][0, t] into nn subintervals. Write the telescoping sum f(Xt,t)f(X0,0)=i[f(Xti+1,ti+1)f(Xti,ti)]f(X_t, t) - f(X_0, 0) = \sum_i [f(X_{t_{i+1}}, t_{i+1}) - f(X_{t_i}, t_i)]. Taylor expand each increment to second order. The first-order terms converge to the Ito integral. The second-order terms converge to 120tf(Xs)σ2(Xs,s)ds\frac{1}{2} \int_0^t f''(X_s) \sigma^2(X_s, s)\,ds because of the quadratic variation of WtW_t. Cross terms and higher-order terms vanish in L2L^2.

Why It Matters

This is the computational workhorse of stochastic calculus. Every application requires you to start with a process XtX_t and compute the dynamics of some function of it. Without Ito's lemma, you cannot derive the Black-Scholes equation, compute the SDE for the score function in diffusion models, or analyze Langevin dynamics.

Failure Mode

The formula requires fC2,1f \in C^{2,1}. If ff is not twice differentiable (e.g., f(x)=xf(x) = |x|), the standard Ito lemma does not apply. You need the Tanaka-Meyer formula, which introduces local time. Also, this is the Ito version. The Stratonovich chain rule has no correction term but changes the integral definition.

Canonical Examples

Example

Geometric Brownian Motion

Let StS_t satisfy dSt=μStdt+σStdWtdS_t = \mu S_t\,dt + \sigma S_t\,dW_t (stock price model). Apply Ito's lemma to f(S)=lnSf(S) = \ln S. We have f=1/Sf' = 1/S and f=1/S2f'' = -1/S^2.

d(lnSt)=1St(μStdt+σStdWt)+12(1St2)σ2St2dtd(\ln S_t) = \frac{1}{S_t}(\mu S_t\,dt + \sigma S_t\,dW_t) + \frac{1}{2}\left(-\frac{1}{S_t^2}\right)\sigma^2 S_t^2\,dt

=(μσ22)dt+σdWt= \left(\mu - \frac{\sigma^2}{2}\right)dt + \sigma\,dW_t

So lnSt\ln S_t is a Brownian motion with drift μσ2/2\mu - \sigma^2/2. The correction term σ2/2-\sigma^2/2 is why the expected log return is less than μ\mu.

Example

Square of Brownian motion

Let f(Wt)=Wt2f(W_t) = W_t^2. Here f=2Wtf' = 2W_t, f=2f'' = 2, μ=0\mu = 0, σ=1\sigma = 1.

d(Wt2)=2WtdWt+12(2)(1)2dt=2WtdWt+dtd(W_t^2) = 2W_t\,dW_t + \frac{1}{2}(2)(1)^2\,dt = 2W_t\,dW_t + dt

Integrating: Wt2=20tWsdWs+tW_t^2 = 2\int_0^t W_s\,dW_s + t. This gives the Ito integral identity 0tWsdWs=12(Wt2t)\int_0^t W_s\,dW_s = \frac{1}{2}(W_t^2 - t).

Theorem

Multidimensional Ito's Lemma

Statement

If dXti=μidt+jσijdWtjdX_t^i = \mu^i\,dt + \sum_j \sigma^{ij}\,dW_t^j, then for fC2,1f \in C^{2,1}:

df=ftdt+ifxidXti+12i,j2fxixjdXi,Xjtdf = \frac{\partial f}{\partial t}\,dt + \sum_i \frac{\partial f}{\partial x^i}\,dX_t^i + \frac{1}{2}\sum_{i,j} \frac{\partial^2 f}{\partial x^i \partial x^j}\,d\langle X^i, X^j \rangle_t

where dXi,Xjt=kσikσjkdtd\langle X^i, X^j \rangle_t = \sum_k \sigma^{ik}\sigma^{jk}\,dt.

Intuition

The same idea as the 1D case, but now the quadratic covariation between different components contributes through the full Hessian matrix of ff.

Proof Sketch

Same Taylor expansion argument as the 1D case, applied componentwise. The cross terms dXidXjdX^i dX^j contribute through the covariation Xi,Xj\langle X^i, X^j \rangle.

Why It Matters

Diffusion models in high dimensions (image generation) operate on multidimensional SDEs. The multivariate version is needed to derive the reverse-time SDE and the score matching objective.

Failure Mode

Same regularity requirements as the 1D case, but now applied to all partial derivatives up to second order.

Common Confusions

Watch Out

Why not just use the ordinary chain rule?

Because (dWt)2=dt0(dW_t)^2 = dt \neq 0. Brownian paths have infinite total variation on any interval, so the second-order term in the Taylor expansion does not vanish. If you apply the ordinary chain rule, you get the wrong drift. The geometric Brownian motion example shows this: the ordinary chain rule gives drift μ\mu for lnS\ln S, but the correct drift is μσ2/2\mu - \sigma^2/2.

Watch Out

Ito vs Stratonovich

In Stratonovich calculus, the chain rule has no correction term: df=f(Xt)dXtdf = f'(X_t) \circ dX_t. The price is that the Stratonovich integral is defined as a midpoint Riemann sum, not an endpoint sum. Ito integrals are martingales (useful for probability arguments). Stratonovich integrals obey the ordinary chain rule (useful for physics). They are related by: the Ito SDE dX=μdt+σdWdX = \mu\,dt + \sigma\,dW corresponds to the Stratonovich SDE dX=(μ12σσ)dt+σdWdX = (\mu - \frac{1}{2}\sigma \sigma')\,dt + \sigma \circ dW.

Summary

  • The Ito correction term is 12f(x)σ2dt\frac{1}{2} f''(x) \sigma^2 dt
  • It exists because (dW)2=dt(dW)^2 = dt, not zero
  • For lnS\ln S of geometric Brownian motion, the correction gives drift μσ2/2\mu - \sigma^2/2
  • The Ito integral 0tWsdWs=12(Wt2t)\int_0^t W_s\,dW_s = \frac{1}{2}(W_t^2 - t), not 12Wt2\frac{1}{2}W_t^2 as the ordinary chain rule would give

Exercises

ExerciseCore

Problem

Apply Ito's lemma to f(Wt)=eWtf(W_t) = e^{W_t}. What SDE does Yt=eWtY_t = e^{W_t} satisfy?

ExerciseAdvanced

Problem

Let XtX_t satisfy dXt=Xtdt+2dWtdX_t = -X_t\,dt + \sqrt{2}\,dW_t (Ornstein-Uhlenbeck process). Apply Ito's lemma to f(Xt,t)=Xt2e2tf(X_t, t) = X_t^2 e^{2t} and use it to compute E[Xt2]\mathbb{E}[X_t^2] given X0=0X_0 = 0.

References

Canonical:

  • Oksendal, Stochastic Differential Equations, Chapter 4
  • Shreve, Stochastic Calculus for Finance II, Chapter 4

Current:

  • Song et al., "Score-Based Generative Modeling through SDEs" (2021), Appendix A

  • Folland, Real Analysis (1999), Chapters 1-7

Next Topics

  • Diffusion models: the primary ML application of Ito's lemma today
  • Flow matching: an alternative to diffusion that avoids SDEs but relates to them

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Next Topics