Mathematical Infrastructure
Backward Stochastic Differential Equations
The Pardoux–Peng framework: an SDE with a terminal condition and an adapted solution pair (Y_t, Z_t). Linear BSDEs reduce to Feynman–Kac; nonlinear BSDEs are dual to Hamilton–Jacobi–Bellman PDEs and are the mathematical object that the deep BSDE method approximates.
Why This Matters
A linear parabolic PDE has a Feynman–Kac representation as an expectation over SDE trajectories. A nonlinear parabolic PDE — one whose lower-order term depends on the solution and its gradient themselves — does not. The clean expectation breaks because the integrand inside the expectation depends on the unknown solution, and you cannot Monte Carlo what you do not yet know. The fix is the backward stochastic differential equation of Pardoux and Peng (1990): an SDE that you solve backward from a terminal condition, with an extra adapted process that enforces measurability.
BSDEs are the natural mathematical object on the bridge between Feynman–Kac (linear case) and Hamilton–Jacobi–Bellman (fully nonlinear case). The driver in a BSDE plays the role of the nonlinear lower-order term in a semilinear parabolic PDE; the pair encodes both the solution and its diffusion-weighted gradient along an SDE path. When is linear in , the BSDE collapses to classical Feynman–Kac. When is convex in , the BSDE represents a stochastic-control value function and recovers the HJB equation.
The historical path is older than 1990. Bismut (1973) wrote down a linear BSDE as the adjoint equation in stochastic control. Pardoux and Peng's contribution was the existence-uniqueness theorem in the nonlinear Lipschitz setting, which made BSDEs an autonomous object rather than a side-equation in optimization. El Karoui, Peng, and Quenez (1997) then turned BSDEs into a working tool for mathematical finance: pricing under constraints, recursive utility, -expectations.
The reason BSDEs matter for ML is downstream. The deep BSDE method of Han, Jentzen, and E (2018) parameterizes the process with a neural network and solves the BSDE by forward shooting under a terminal-loss. This is the only known method that handles fully nonlinear PDEs in dimensions with a few percent error. Every line of the algorithm is a discretization of the Pardoux–Peng BSDE and inherits its existence guarantees.
Mental Model
A forward SDE specifies an initial condition and runs forward in time. A BSDE specifies a terminal condition and asks for a process that lands on at time . The catch: must be adapted to the forward Brownian filtration. You cannot just solve pointwise, because that requires knowing the future for to compute the integrand.
The trick is to add a second unknown process — a "control" or "martingale-representation" coefficient — and solve the system jointly. The pair is what makes backward solvability work. The role of becomes transparent in the Markovian case: when for some function and forward state , the martingale representation theorem forces . So is the diffusion-weighted gradient of the value function along the path. This is exactly what the deep BSDE method parameterizes.
Formal Statement
Backward Stochastic Differential Equation
Fix a horizon , a -dimensional Brownian motion on a filtered probability space , and a terminal random variable . A driver is a measurable function . A solution to the BSDE with terminal value and driver is a pair of adapted processes satisfying
or equivalently in differential form , with terminal condition . The solution lives in : means , and means .
The vector-valued generalization replaces by and by a matrix; the same theory carries through with obvious notational changes.
The minus sign on the stochastic integral is the convention that makes a forward Itô integral; the integration variable runs forward in time even though the equation is "solved backward" from the terminal condition. This is not pathwise time reversal; the filtration is still the forward Brownian filtration. The "backward" in BSDE refers strictly to the direction in which the boundary condition is imposed.
Pardoux–Peng Existence and Uniqueness
Pardoux–Peng Existence and Uniqueness Theorem
Statement
Under the assumptions above, the BSDE has a unique solution . Moreover, the solution depends continuously on the data in the natural -norms, and a comparison principle holds: if pointwise, then almost surely for every .
Intuition
The driver is Lipschitz, which makes the operator that maps a candidate to the next iterate (defined via conditional expectation against the terminal value) a contraction in a suitable weighted norm. Banach fixed-point gives existence and uniqueness. The role of is forced by the martingale representation theorem: any square-integrable martingale on the Brownian filtration is a stochastic integral against , and is the integrand that makes an Itô process.
Proof Sketch
Define the map as follows. Given a candidate , set . By the martingale representation theorem there is a unique with . Define , equivalently . Then . Equip with the weighted norm . Itô's formula applied to together with the Lipschitz hypothesis on gives a contraction estimate with for large enough. Banach fixed-point closes the proof.
Why It Matters
This is the foundational well-posedness result. Without it, BSDEs would be formal manipulations with no guarantee that the equations they appear in have meaning. Three downstream consequences. First, the nonlinear Feynman–Kac representation (next theorem) inherits existence-uniqueness from this result; without it, the connection to semilinear PDEs would be one-sided. Second, the comparison principle is the BSDE analog of the maximum principle for parabolic PDEs and is the foundation of -expectations and BSDE-based risk measures. Third, the contraction estimate is what licenses Picard iteration as a numerical scheme, and in the deep BSDE method it is what guarantees that the forward-shooting loss has a unique global minimum once the gradient network is expressive enough. There exists a unique pair solving the BSDE.
Failure Mode
The Lipschitz hypothesis on is essential for the contraction step. Drivers with quadratic growth in (e.g., , arising in exponential utility maximization) fall outside Pardoux–Peng. Existence in that regime requires a different proof technique due to Kobylanski (2000) based on an a-priori sup-norm bound for and an exponential transformation. Uniqueness is harder still and was settled only later (Briand and Hu 2008, Delbaen et al. 2011). The clean BSDE theory is the Lipschitz case; quadratic BSDEs are a separate and substantially more involved chapter.
Nonlinear Feynman–Kac
The reason BSDEs matter for PDE theory is the Markovian case in which the terminal value and driver are functions of a forward SDE. Pardoux and Peng (1992) proved that the BSDE solution is then exactly the value function of a semilinear parabolic PDE.
Nonlinear Feynman–Kac (Pardoux–Peng 1992)
Statement
Let solve the forward SDE above and let solve the associated BSDE on with terminal value and driver . Define . Then is the unique viscosity solution of the semilinear parabolic PDE
where is the generator of . When , the BSDE solution along the path admits the Markovian representation and for .
Intuition
Apply Itô's formula to . The drift bracket is forced by the PDE to equal , and the diffusion bracket is the integrand against . Comparing with the BSDE equation identifies and . The terminal condition matches automatically.
Proof Sketch
The forward direction (PDE solution gives BSDE solution) is the Itô calculation just sketched. For the reverse direction (BSDE solution gives PDE viscosity solution), one shows that inherits continuity in from the BSDE's continuous dependence on initial data, then verifies the viscosity sub- and super-solution inequalities via a Markov-property argument and the comparison principle for BSDEs. Pardoux and Peng (1992) handle the smooth case; Pardoux (1999) and Barles, Buckdahn, and Pardoux (1997) extend to viscosity solutions and to PDEs with reflection or jumps.
Why It Matters
This is the nonlinear Feynman–Kac formula. It generalizes the linear formula to PDEs whose lower-order term depends on and themselves. The expectation representation breaks (you cannot integrate against an unknown ), and the BSDE replaces it with an implicit fixed-point representation. When is convex in (the typical situation in stochastic control), the PDE above is the Hamilton–Jacobi–Bellman equation of a control problem and the BSDE is its dual. This is the mathematical content of the duality between forward stochastic control and backward representation that Bismut (1973) first identified. The BSDE solution is and , where solves the semilinear parabolic PDE with terminal condition .
Failure Mode
Fully nonlinear PDEs (those involving inside the nonlinearity, like Monge–Ampère or the second-order HJB with controlled diffusion) are not covered. The natural representation there is the second-order BSDE of Cheridito, Soner, Touzi, and Victoir (2007) and Soner, Touzi, and Zhang (2012), which adds a third process representing the Hessian. The clean Pardoux–Peng theory is restricted to semilinear PDEs where the second-order term is fixed by the forward diffusion.
The Y/Z Decomposition
In the Markovian case, the BSDE solution decomposes cleanly: is the value function evaluated along the path, and is the diffusion-weighted gradient of the value function. The two processes carry complementary information: tracks the level of along the trajectory, and tracks the slope.
This decomposition is what the deep BSDE method exploits. The algorithm parameterizes as a single trainable scalar (the unknown PDE solution at the initial point) and as a neural network at each time step . The forward Euler update propagates the candidate trajectory, and a terminal loss drives optimization. The -dependence enters polynomially through the network input dimension, never through a spatial grid.
Worked Example: Linear BSDE Recovers Discounted Feynman–Kac
Take a driver linear in and independent of : for deterministic functions . The BSDE is
This is a stochastic linear ODE in with random terminal condition. Apply the integrating factor and rearrange to get , where is a discount factor. In Markovian form with and all coefficients depending on , this is exactly the discounted Feynman–Kac formula of the Feynman–Kac topic page: where solves with . The process is recovered as , the integrand in the martingale representation of against .
The lesson: the linear BSDE is exactly the discounted Feynman–Kac formula in disguise. The BSDE machinery is non-trivial only when depends on or in a genuinely nonlinear way.
Common Confusions
The 'backward' in BSDE refers to the terminal condition, not pathwise time reversal
A BSDE is not a forward SDE run with the time direction flipped. The filtration is still the forward Brownian filtration , and the stochastic integral is a forward Itô integral. What is "backward" is the location of the boundary condition: instead of an initial condition , the equation imposes a terminal condition . This is closer in spirit to a parabolic PDE solved backward from a Cauchy datum at time than to a time-reversed SDE in the Anderson sense. The two notions are unrelated.
The Z process is not a free parameter; it is forced by martingale representation
The unknowns of a BSDE are both and , but they are not independent. Once is required to be adapted and to satisfy the integral equation with terminal value , the martingale representation theorem forces to be the integrand of the martingale against . The pair is jointly determined; you do not get to choose separately. This is also why the BSDE is well-posed: the extra unknown is exactly compensated by the extra structural constraint that be adapted.
Quadratic-growth drivers are genuinely outside Pardoux–Peng
Pardoux–Peng requires to be uniformly Lipschitz in . Drivers with quadratic growth in , common in entropic risk measures and exponential utility, violate the Lipschitz hypothesis and require the separate theory of Kobylanski (2000). The trick there is an exponential transformation that linearizes the quadratic term, plus an a-priori sup-norm bound on from the boundedness of . Existence holds; uniqueness is much harder and depends on additional structural assumptions on . Treating quadratic BSDEs as a "slight extension" of Lipschitz BSDEs underestimates the difficulty.
Exercises
Problem
Solve the linear BSDE on with terminal condition , where are constants and . Give explicit formulas for and .
Problem
Prove the contraction step in Pardoux–Peng. Let be the map defined in the proof sketch above. Equip the codomain with the -weighted norm . Show that for large enough (depending on the Lipschitz constant of ), is a strict contraction in this norm.
References
No canonical references provided.
No current references provided.
No frontier references provided.
Next Topics
- Deep BSDE Method: the Han–Jentzen–E neural-network solver that parameterizes with a network at each time step and minimizes a terminal-condition loss.
- Hamilton–Jacobi–Bellman Equation: the PDE that arises when the BSDE driver is a control Hamiltonian; the canonical setting where BSDE duality replaces dynamic programming.
- Feynman–Kac Formula: the linear case that BSDEs generalize; the BSDE collapses to the discounted Feynman–Kac expectation when the driver is linear.
- Stochastic Differential Equations: the forward equation whose path the BSDE is solved along in the Markovian case.
- Itô's Lemma: the chain rule that produces the Markovian BSDE from a solution of the associated semilinear PDE.
Last reviewed: April 18, 2026
Prerequisites
Foundations this topic depends on.
- Stochastic Differential EquationsLayer 3
- Brownian MotionLayer 2
- Measure-Theoretic ProbabilityLayer 0B
- Martingale TheoryLayer 0B
- Ito's LemmaLayer 3
- Stochastic Calculus for MLLayer 3
- Feynman–Kac FormulaLayer 3