Mathematical Infrastructure
Ito's Lemma
The chain rule of stochastic calculus: if X_t follows an SDE, then f(X_t) follows a modified SDE with an extra second-order correction term that has no analogue in ordinary calculus.
Prerequisites
Why This Matters
In ordinary calculus, if you know and want , you apply the chain rule: . In stochastic calculus, this formula is wrong. The chain rule picks up an extra term proportional to . This correction term is the reason stochastic calculus exists as a separate subject.
Every derivation in diffusion models, score-based generative models, and mathematical finance uses Ito's lemma. If you cannot apply it mechanically, you cannot read these papers.
Mental Model
A Brownian motion path is so rough that does not vanish. It equals in a precise sense (quadratic variation). When you Taylor expand , the second-order term survives because contains a piece. In ordinary calculus, . In stochastic calculus, .
Setup
Let be an Ito process satisfying the SDE:
where is standard Brownian motion, is the drift, and is the diffusion coefficient.
Quadratic Variation Rule
The multiplication rules for Ito calculus are:
These rules follow from the quadratic variation of Brownian motion: .
Main Theorems
Ito's Lemma (One Dimension)
Statement
Let . Then satisfies:
The term is the Ito correction. It has no analogue in ordinary calculus.
Intuition
Taylor expand to second order. The first-order terms give the ordinary chain rule. The second-order term normally vanishes, but here . So survives.
Proof Sketch
Partition into subintervals. Write the telescoping sum . Taylor expand each increment to second order. The first-order terms converge to the Ito integral. The second-order terms converge to because of the quadratic variation of . Cross terms and higher-order terms vanish in .
Why It Matters
This is the computational workhorse of stochastic calculus. Every application requires you to start with a process and compute the dynamics of some function of it. Without Ito's lemma, you cannot derive the Black-Scholes equation, compute the SDE for the score function in diffusion models, or analyze Langevin dynamics.
Failure Mode
The formula requires . If is not twice differentiable (e.g., ), the standard Ito lemma does not apply. You need the Tanaka-Meyer formula, which introduces local time. Also, this is the Ito version. The Stratonovich chain rule has no correction term but changes the integral definition.
Canonical Examples
Geometric Brownian Motion
Let satisfy (stock price model). Apply Ito's lemma to . We have and .
So is a Brownian motion with drift . The correction term is why the expected log return is less than .
Square of Brownian motion
Let . Here , , , .
Integrating: . This gives the Ito integral identity .
Multidimensional Ito's Lemma
Statement
If , then for :
where .
Intuition
The same idea as the 1D case, but now the quadratic covariation between different components contributes through the full Hessian matrix of .
Proof Sketch
Same Taylor expansion argument as the 1D case, applied componentwise. The cross terms contribute through the covariation .
Why It Matters
Diffusion models in high dimensions (image generation) operate on multidimensional SDEs. The multivariate version is needed to derive the reverse-time SDE and the score matching objective.
Failure Mode
Same regularity requirements as the 1D case, but now applied to all partial derivatives up to second order.
Common Confusions
Why not just use the ordinary chain rule?
Because . Brownian paths have infinite total variation on any interval, so the second-order term in the Taylor expansion does not vanish. If you apply the ordinary chain rule, you get the wrong drift. The geometric Brownian motion example shows this: the ordinary chain rule gives drift for , but the correct drift is .
Ito vs Stratonovich
In Stratonovich calculus, the chain rule has no correction term: . The price is that the Stratonovich integral is defined as a midpoint Riemann sum, not an endpoint sum. Ito integrals are martingales (useful for probability arguments). Stratonovich integrals obey the ordinary chain rule (useful for physics). They are related by: the Ito SDE corresponds to the Stratonovich SDE .
Summary
- The Ito correction term is
- It exists because , not zero
- For of geometric Brownian motion, the correction gives drift
- The Ito integral , not as the ordinary chain rule would give
Exercises
Problem
Apply Ito's lemma to . What SDE does satisfy?
Problem
Let satisfy (Ornstein-Uhlenbeck process). Apply Ito's lemma to and use it to compute given .
References
Canonical:
- Oksendal, Stochastic Differential Equations, Chapter 4
- Shreve, Stochastic Calculus for Finance II, Chapter 4
Current:
-
Song et al., "Score-Based Generative Modeling through SDEs" (2021), Appendix A
-
Folland, Real Analysis (1999), Chapters 1-7
Next Topics
- Diffusion models: the primary ML application of Ito's lemma today
- Flow matching: an alternative to diffusion that avoids SDEs but relates to them
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Stochastic Calculus for MLLayer 3
- Martingale TheoryLayer 0B
- Measure-Theoretic ProbabilityLayer 0B