Backward Stochastic Differential Equations

Sneiderman, Robby

Mathematical Infrastructure

Backward Stochastic Differential Equations

The Pardoux-Peng framework: an SDE with a terminal condition and an adapted solution pair. Linear BSDEs reduce to Feynman-Kac; nonlinear BSDEs are dual to Hamilton-Jacobi-Bellman PDEs and are the mathematical object that the deep BSDE method approximates.

AdvancedTier 2StableSupporting~50 min

Prerequisites

Stochastic Differential Equations Ito Lemma Feynman Kac Formula Hamilton Jacobi Bellman Equation

Prereq Map

Learning position

Read this page in the graph.

mathematical-infrastructure | layer 3 | tier 2. This page has 4 direct prerequisites and 0 published dependents.

Open Atlas Prerequisites Leads to

What next

Hamilton–Jacobi–Bellman Equation

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

A linear parabolic PDE has a Feynman–Kac representation as an expectation over SDE trajectories. A nonlinear parabolic PDE — one whose lower-order term depends on the solution $u$ and its gradient $\nabla u$ themselves — does not. The clean expectation breaks because the integrand inside the expectation depends on the unknown solution, and you cannot Monte Carlo what you do not yet know. The fix is the backward stochastic differential equation of Pardoux and Peng (1990): an SDE that you solve backward from a terminal condition, with an extra adapted process $Z_t$ that enforces measurability.

BSDEs are the natural mathematical object on the bridge between Feynman–Kac (linear case) and Hamilton–Jacobi–Bellman (fully nonlinear case). The driver $f$ in a BSDE plays the role of the nonlinear lower-order term in a semilinear parabolic PDE; the pair $(Y_t, Z_t)$ encodes both the solution and its diffusion-weighted gradient along an SDE path. When $f$ is linear in $(y, z)$ , the BSDE collapses to classical Feynman–Kac. When $f$ is convex in $z$ , the BSDE represents a stochastic-control value function and recovers the HJB equation.

The historical path is older than 1990. Bismut (1973) wrote down a linear BSDE as the adjoint equation in stochastic control. Pardoux and Peng's contribution was the existence-uniqueness theorem in the nonlinear Lipschitz setting, which made BSDEs an autonomous object rather than a side-equation in optimization. El Karoui, Peng, and Quenez (1997) then turned BSDEs into a working tool for mathematical finance: pricing under constraints, recursive utility, $g$ -expectations.

The reason BSDEs matter for ML is downstream. The deep BSDE method of Han, Jentzen, and E (2018) parameterizes the $Z$ process with a neural network and solves the BSDE by forward shooting under a terminal-loss. This is the only known method that handles fully nonlinear PDEs in $d = 100$ dimensions with a few percent error. Every line of the algorithm is a discretization of the Pardoux–Peng BSDE and inherits its existence guarantees.

Mental Model

A forward SDE specifies an initial condition and runs forward in time. A BSDE specifies a terminal condition $Y_T = \xi$ and asks for a process $Y_t$ that lands on $\xi$ at time $T$ . The catch: $Y_t$ must be adapted to the forward Brownian filtration. You cannot just solve $Y_t = \mathbb{E}[\xi + \int_t^T f(s, Y_s, Z_s)\,ds \mid \mathcal{F}_t]$ pointwise, because that requires knowing the future $Y_s$ for $s > t$ to compute the integrand.

The trick is to add a second unknown process $Z_t$ — a "control" or "martingale-representation" coefficient — and solve the system jointly. The pair $(Y, Z)$ is what makes backward solvability work. The role of $Z$ becomes transparent in the Markovian case: when $Y_t = u(t, X_t)$ for some function $u$ and forward state $X_t$ , the martingale representation theorem forces $Z_t = \sigma^\top(X_t) \nabla u(t, X_t)$ . So $Z$ is the diffusion-weighted gradient of the value function along the path. This is exactly what the deep BSDE method parameterizes.

Formal Statement

Definition

Backward Stochastic Differential Equation $- d Y_{t} = f (t, Y_{t}, Z_{t}) d t - Z_{t}^{⊤} d B_{t}, Y_{T} = ξ$

Fix a horizon $T > 0$ , a $d$ -dimensional Brownian motion $B_t$ on a filtered probability space $(\Omega, \mathcal{F}, \{\mathcal{F}_t\}, \mathbb{P})$ , and a terminal random variable $\xi \in L^2(\mathcal{F}_T)$ . A driver is a measurable function $f: [0, T] \times \mathbb{R} \times \mathbb{R}^d \to \mathbb{R}$ . A solution to the BSDE with terminal value $\xi$ and driver $f$ is a pair of adapted processes $(Y_t, Z_t) \in \mathbb{R} \times \mathbb{R}^d$ satisfying

Y_t = \xi + \int_t^T f(s, Y_s, Z_s)\,ds - \int_t^T Z_s^\top\,dB_s, \qquad t \in [0, T],

or equivalently in differential form $-dY_t = f(t, Y_t, Z_t)\,dt - Z_t^\top dB_t$ , with terminal condition $Y_T = \xi$ . The solution lives in $S^2 \times H^2$ : $Y \in S^2$ means $\mathbb{E}[\sup_{t \le T} \lvert Y_t \rvert^2] < \infty$ , and $Z \in H^2$ means $\mathbb{E}[\int_0^T \lvert Z_t \rvert^2\,dt] < \infty$ .

The vector-valued generalization replaces $Y_t \in \mathbb{R}$ by $Y_t \in \mathbb{R}^k$ and $Z_t$ by a $k \times d$ matrix; the same theory carries through with obvious notational changes.

The minus sign on the stochastic integral is the convention that makes $\int_t^T Z_s^\top\,dB_s$ a forward Itô integral; the integration variable $s$ runs forward in time even though the equation is "solved backward" from the terminal condition. This is not pathwise time reversal; the filtration is still the forward Brownian filtration. The "backward" in BSDE refers strictly to the direction in which the boundary condition is imposed.

Pardoux–Peng Existence and Uniqueness

Theorem

Pardoux–Peng Existence and Uniqueness Theorem

Statement

Under the assumptions above, the BSDE $Y_t = \xi + \int_t^T f(s, Y_s, Z_s) \,ds - \int_t^T Z_s^\top\,dB_s$ has a unique solution $(Y, Z) \in S^2 \times H^2$ . Moreover, the solution depends continuously on the data $(\xi, f)$ in the natural $L^2$ -norms, and a comparison principle holds: if $(\xi_1, f_1) \le (\xi_2, f_2)$ pointwise, then $Y^{(1)}_t \le Y^{(2)}_t$ almost surely for every $t$ .

Intuition

The driver $f$ is Lipschitz, which makes the operator that maps a candidate $(Y, Z)$ to the next iterate (defined via conditional expectation against the terminal value) a contraction in a suitable weighted norm. Banach fixed-point gives existence and uniqueness. The role of $Z$ is forced by the martingale representation theorem: any square-integrable martingale on the Brownian filtration is a stochastic integral against $B$ , and $Z$ is the integrand that makes $Y$ an Itô process.

Proof Sketch

Define the map $\Phi: H^2 \times H^2 \to H^2 \times H^2$ as follows. Given a candidate $(y, z)$ , set $M_t = \mathbb{E}[\xi + \int_0^T f(s, y_s, z_s) \,ds \mid \mathcal{F}_t]$ . By the martingale representation theorem there is a unique $Z' \in H^2$ with $M_t = M_0 + \int_0^t Z'_s{}^\top\,dB_s$ . Define $Y'_t = M_t - \int_0^t f(s, y_s, z_s)\,ds$ , equivalently $Y'_t = \xi + \int_t^T f(s, y_s, z_s)\,ds - \int_t^T Z'_s{}^\top\,dB_s$ . Then $\Phi(y, z) = (Y', Z')$ . Equip $H^2 \times H^2$ with the weighted norm $\|(y, z)\|_\beta^2 = \mathbb{E}[\int_0^T e^{\beta t} (\lvert y_t \rvert^2 + \lvert z_t \rvert^2)\,dt]$ . Itô's formula applied to $e^{\beta t}\lvert Y'^{(1)} - Y'^{(2)} \rvert^2$ together with the Lipschitz hypothesis on $f$ gives a contraction estimate $\|\Phi(y_1, z_1) - \Phi(y_2, z_2)\|_\beta \le \rho(\beta) \|(y_1, z_1) - (y_2, z_2)\|_\beta$ with $\rho(\beta) < 1$ for $\beta$ large enough. Banach fixed-point closes the proof.

Why It Matters

This is the foundational well-posedness result. Without it, BSDEs would be formal manipulations with no guarantee that the equations they appear in have meaning. Three downstream consequences. First, the nonlinear Feynman–Kac representation (next theorem) inherits existence-uniqueness from this result; without it, the connection to semilinear PDEs would be one-sided. Second, the comparison principle is the BSDE analog of the maximum principle for parabolic PDEs and is the foundation of $g$ -expectations and BSDE-based risk measures. Third, the contraction estimate is what licenses Picard iteration as a numerical scheme, and in the deep BSDE method it is what guarantees that the forward-shooting loss has a unique global minimum once the gradient network is expressive enough. There exists a unique pair $(Y, Z) \in S^2 \times H^2$ solving the BSDE.

Failure Mode

The Lipschitz hypothesis on $f$ is essential for the contraction step. Drivers with quadratic growth in $z$ (e.g., $f(t, y, z) = \tfrac{1}{2} \lvert z \rvert^2 + g(t, y)$ , arising in exponential utility maximization) fall outside Pardoux–Peng. Existence in that regime requires a different proof technique due to Kobylanski (2000) based on an a-priori sup-norm bound for $Y$ and an exponential transformation. Uniqueness is harder still and was settled only later (Briand and Hu 2008, Delbaen et al. 2011). The clean BSDE theory is the Lipschitz case; quadratic BSDEs are a separate and substantially more involved chapter.

report a correction →

Nonlinear Feynman–Kac

The reason BSDEs matter for PDE theory is the Markovian case in which the terminal value and driver are functions of a forward SDE. Pardoux and Peng (1992) proved that the BSDE solution is then exactly the value function of a semilinear parabolic PDE.

Theorem

Nonlinear Feynman–Kac (Pardoux–Peng 1992)

Statement

Let $X$ solve the forward SDE above and let $(Y^{t,x}, Z^{t,x})$ solve the associated BSDE on $[t, T]$ with terminal value $g(X_T)$ and driver $h(s, X_s, Y_s, Z_s)$ . Define $u(t, x) = Y^{t,x}_t$ . Then $u$ is the unique viscosity solution of the semilinear parabolic PDE

\partial_t u + \mathcal{L} u + h(t, x, u, \sigma^\top \nabla u) = 0, \qquad u(T, x) = g(x),

where $\mathcal{L} u = b \cdot \nabla u + \tfrac{1}{2} \operatorname{Tr} (\sigma \sigma^\top \nabla^2 u)$ is the generator of $X$ . When $u \in C^{1,2}$ , the BSDE solution along the path admits the Markovian representation $Y^{t,x}_s = u(s, X_s)$ and $Z^{t,x}_s = \sigma^\top(s, X_s) \nabla u(s, X_s)$ for $s \in [t, T]$ .

Intuition

Apply Itô's formula to $u(s, X_s)$ . The drift bracket $\partial_s u + \mathcal{L} u$ is forced by the PDE to equal $-h(s, X_s, u, \sigma^\top \nabla u)$ , and the diffusion bracket $\sigma^\top \nabla u$ is the integrand against $dB_s$ . Comparing with the BSDE equation $dY_s = -f(s, Y_s, Z_s)\,ds + Z_s^\top dB_s$ identifies $Y_s = u(s, X_s)$ and $Z_s = \sigma^\top(s, X_s) \nabla u(s, X_s)$ . The terminal condition $u(T, X_T) = g(X_T) = \xi$ matches automatically.

Proof Sketch

The forward direction (PDE solution gives BSDE solution) is the Itô calculation just sketched. For the reverse direction (BSDE solution gives PDE viscosity solution), one shows that $u(t, x) = Y^{t,x}_t$ inherits continuity in $(t, x)$ from the BSDE's continuous dependence on initial data, then verifies the viscosity sub- and super-solution inequalities via a Markov-property argument and the comparison principle for BSDEs. Pardoux and Peng (1992) handle the smooth case; Pardoux (1999) and Barles, Buckdahn, and Pardoux (1997) extend to viscosity solutions and to PDEs with reflection or jumps.

Why It Matters

This is the nonlinear Feynman–Kac formula. It generalizes the linear formula $u(t, x) = \mathbb{E}[g(X_T) \mid X_t = x]$ to PDEs whose lower-order term depends on $u$ and $\nabla u$ themselves. The expectation representation breaks (you cannot integrate against an unknown $u$ ), and the BSDE replaces it with an implicit fixed-point representation. When $h$ is convex in $z$ (the typical situation in stochastic control), the PDE above is the Hamilton–Jacobi–Bellman equation of a control problem and the BSDE is its dual. This is the mathematical content of the duality between forward stochastic control and backward representation that Bismut (1973) first identified. The BSDE solution is $Y_t = u(t, X_t)$ and $Z_t = \sigma^\top(t, X_t) \nabla u(t, X_t)$ , where $u$ solves the semilinear parabolic PDE $\partial_t u + \mathcal{L} u + h(t, x, u, \sigma^\top \nabla u) = 0$ with terminal condition $u(T, x) = g(x)$ .

Failure Mode

Fully nonlinear PDEs (those involving $\nabla^2 u$ inside the nonlinearity, like Monge–Ampère or the second-order HJB with controlled diffusion) are not covered. The natural representation there is the second-order BSDE of Cheridito, Soner, Touzi, and Victoir (2007) and Soner, Touzi, and Zhang (2012), which adds a third process $\Gamma$ representing the Hessian. The clean Pardoux–Peng theory is restricted to semilinear PDEs where the second-order term is fixed by the forward diffusion.

report a correction →

The Y/Z Decomposition

In the Markovian case, the BSDE solution decomposes cleanly: $Y_t = u(t, X_t)$ is the value function evaluated along the path, and $Z_t = \sigma^\top(t, X_t) \nabla u(t, X_t)$ is the diffusion-weighted gradient of the value function. The two processes carry complementary information: $Y$ tracks the level of $u$ along the trajectory, and $Z$ tracks the slope.

This decomposition is what the deep BSDE method exploits. The algorithm parameterizes $Y_0$ as a single trainable scalar (the unknown PDE solution at the initial point) and $Z_{t_k}$ as a neural network $\phi_{\theta_k}: \mathbb{R}^d \to \mathbb{R}^d$ at each time step $t_k$ . The forward Euler update $Y_{t_{k+1}} = Y_{t_k} - f(t_k, Y_{t_k}, Z_{t_k})\,\Delta t + Z_{t_k}^\top \Delta B_k$ propagates the candidate trajectory, and a terminal $L^2$ loss $\mathbb{E}[\lvert Y_{t_N} - g(X_{t_N}) \rvert^2]$ drives optimization. The $d$ -dependence enters polynomially through the network input dimension, never through a spatial grid.

Worked Example: Linear BSDE Recovers Discounted Feynman–Kac

Take a driver linear in $y$ and independent of $z$ : $f(s, y, z) = -c(s) y - h(s)$ for deterministic functions $c, h$ . The BSDE is

-dY_s = \big(-c(s) Y_s - h(s)\big)\,ds - Z_s^\top dB_s, \qquad Y_T = \xi.

This is a stochastic linear ODE in $Y$ with random terminal condition. Apply the integrating factor $e^{\int_t^s c(r)\,dr}$ and rearrange to get $Y_t = \mathbb{E}\big[\,\Lambda_{t, T}\, \xi + \int_t^T \Lambda_{t, s} h(s) \,ds \,\big|\, \mathcal{F}_t\,\big]$ , where $\Lambda_{t, s} = \exp(-\int_t^s c(r)\,dr)$ is a discount factor. In Markovian form with $\xi = g(X_T)$ and all coefficients depending on $X$ , this is exactly the discounted Feynman–Kac formula of the Feynman–Kac topic page: $Y_t = u(t, X_t)$ where $u$ solves $\partial_t u + \mathcal{L} u - c\, u + h = 0$ with $u(T, x) = g(x)$ . The $Z$ process is recovered as $Z_t = \sigma^\top(t, X_t) \nabla u(t, X_t)$ , the integrand in the martingale representation of $Y_t$ against $B_t$ .

The lesson: the linear BSDE is exactly the discounted Feynman–Kac formula in disguise. The BSDE machinery is non-trivial only when $f$ depends on $y$ or $z$ in a genuinely nonlinear way.

Common Confusions

Watch Out

The 'backward' in BSDE refers to the terminal condition, not pathwise time reversal

A BSDE is not a forward SDE run with the time direction flipped. The filtration is still the forward Brownian filtration $\mathcal{F}_t = \sigma(B_s : s \le t)$ , and the stochastic integral $\int_t^T Z_s^\top \,dB_s$ is a forward Itô integral. What is "backward" is the location of the boundary condition: instead of an initial condition $Y_0 = y_0$ , the equation imposes a terminal condition $Y_T = \xi$ . This is closer in spirit to a parabolic PDE solved backward from a Cauchy datum at time $T$ than to a time-reversed SDE in the Anderson sense. The two notions are unrelated.

Watch Out

The Z process is not a free parameter; it is forced by martingale representation

The unknowns of a BSDE are both $Y$ and $Z$ , but they are not independent. Once $Y$ is required to be adapted and to satisfy the integral equation with terminal value $\xi$ , the martingale representation theorem forces $Z$ to be the integrand of the martingale $M_t = \mathbb{E}[\xi + \int_0^T f \,ds \mid \mathcal{F}_t]$ against $B$ . The pair $(Y, Z)$ is jointly determined; you do not get to choose $Z$ separately. This is also why the BSDE is well-posed: the extra unknown $Z$ is exactly compensated by the extra structural constraint that $Y$ be adapted.

Watch Out

Quadratic-growth drivers are genuinely outside Pardoux–Peng

Pardoux–Peng requires $f$ to be uniformly Lipschitz in $(y, z)$ . Drivers with quadratic growth in $z$ , common in entropic risk measures and exponential utility, violate the Lipschitz hypothesis and require the separate theory of Kobylanski (2000). The trick there is an exponential transformation $\tilde{Y}_t = \exp(\eta Y_t)$ that linearizes the quadratic term, plus an a-priori sup-norm bound on $Y$ from the boundedness of $\xi$ . Existence holds; uniqueness is much harder and depends on additional structural assumptions on $f$ . Treating quadratic BSDEs as a "slight extension" of Lipschitz BSDEs underestimates the difficulty.

Exercises

ExerciseCore

Problem

Solve the linear BSDE $-dY_t = (a Y_t + b)\,dt - Z_t\,dB_t$ on $[0, T]$ with terminal condition $Y_T = \xi$ , where $a, b \in \mathbb{R}$ are constants and $\xi \in L^2(\mathcal{F}_T)$ . Give explicit formulas for $Y_t$ and $Z_t$ .

ExerciseAdvanced

Problem

Prove the contraction step in Pardoux–Peng. Let $\Phi: H^2 \times H^2 \to H^2 \times H^2$ be the map defined in the proof sketch above. Equip the codomain with the $\beta$ -weighted norm $\|(Y, Z)\|_\beta^2 = \mathbb{E} [\int_0^T e^{\beta t}(\lvert Y_t \rvert^2 + \lvert Z_t \rvert^2)\,dt]$ . Show that for $\beta$ large enough (depending on the Lipschitz constant $K$ of $f$ ), $\Phi$ is a strict contraction in this norm.

References

Canonical:

Pardoux and Peng, Adapted solution of a backward stochastic differential equation (Systems and Control Letters 14, 1990). The foundational existence-uniqueness paper. Four pages, dense, and still the canonical citation.
Pardoux and Peng, Backward stochastic differential equations and quasilinear parabolic partial differential equations (Lecture Notes in Control and Information Sciences 176, 1992), pp. 200–217. The nonlinear Feynman–Kac theorem connecting BSDEs to semilinear parabolic PDEs.
Bismut, Conjugate convex functions in optimal stochastic control (Journal of Mathematical Analysis and Applications 44, 1973). The historical origin of linear BSDEs as adjoint equations in stochastic-control duality.
Yong and Zhou, Stochastic Controls: Hamiltonian Systems and HJB Equations (Springer, 1999), Chapter 7. Self-contained textbook treatment of BSDEs in the stochastic-control framework, including the Bismut–Pontryagin maximum principle.

Current:

El Karoui, Peng, and Quenez, Backward stochastic differential equations in finance (Mathematical Finance 7, 1997). The classical reference that turned BSDEs into a working tool for option pricing under nonlinear constraints, recursive utility, and $g$ -expectations.
Zhang, Backward Stochastic Differential Equations: From Linear to Fully Nonlinear Theory (Springer Probability Theory and Stochastic Modelling 86, 2017). The most comprehensive modern reference. Chapters 4–5 cover the Pardoux–Peng theory; later chapters cover quadratic, reflected, and second-order BSDEs.
Kobylanski, Backward stochastic differential equations and partial differential equations with quadratic growth (Annals of Probability 28, 2000). Existence theory for drivers with quadratic growth in $z$ , outside the Pardoux–Peng Lipschitz framework.
Han, Jentzen, and E, Solving high-dimensional partial differential equations using deep learning (PNAS 115, 2018). The deep BSDE method: numerical solution of nonlinear-Feynman–Kac BSDEs in $d = 100$ dimensions via neural-network parameterization of the $Z$ process.

Next Topics

Deep BSDE Method: the Han–Jentzen–E neural-network solver that parameterizes $Z_{t_k}$ with a network at each time step and minimizes a terminal-condition $L^2$ loss.
Hamilton–Jacobi–Bellman Equation: the PDE that arises when the BSDE driver $f$ is a control Hamiltonian; the canonical setting where BSDE duality replaces dynamic programming.
Feynman–Kac Formula: the linear case that BSDEs generalize; the BSDE collapses to the discounted Feynman–Kac expectation when the driver is linear.
Stochastic Differential Equations: the forward equation whose path the BSDE is solved along in the Markovian case.
Itô's Lemma: the chain rule that produces the Markovian BSDE from a $C^{1,2}$ solution of the associated semilinear PDE.

Last reviewed: April 18, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

4

Feynman–Kac Formulalayer 3 · tier 2
Hamilton–Jacobi–Bellman Equationlayer 3 · tier 2
Ito's Lemmalayer 3 · tier 2
Stochastic Differential Equationslayer 3 · tier 2

Derived topics

0

No published topic currently declares this as a prerequisite.