Hamilton–Jacobi–Bellman Equation

Sneiderman, Robby

Mathematical Infrastructure

Hamilton–Jacobi–Bellman Equation

The PDE characterizing the value function of a continuous-time stochastic optimal control problem. The continuous-time analog of the discrete Bellman equation, the fully nonlinear PDE that nonlinear Feynman–Kac inverts via BSDEs, and the equation Deep BSDE solves numerically in high dimensions.

AdvancedTier 2StableSupporting~50 min

Prerequisites

Stochastic Differential Equations Feynman Kac Formula

Prereq Map

Learning position

Read this page in the graph.

mathematical-infrastructure | layer 3 | tier 2. This page has 2 direct prerequisites and 1 published dependent.

Open Atlas Prerequisites Leads to

What next

Backward Stochastic Differential Equations

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Hamilton-Jacobi-Bellman overview showing controlled dynamics, cost, value functions, the PDE, and optimal feedback. — Hamilton-Jacobi-Bellman pipeline from controlled dynamics to the value-function PDE and optimal feedback.

The Hamilton–Jacobi–Bellman equation is the continuous-time Bellman equation: the PDE that the value function of a stochastic control problem must satisfy. The discrete Bellman recursion $V_t(x) = \min_a \{c(x, a) + \mathbb{E}[V_{t+1}(X')]\}$ becomes, after a Taylor expansion of the expectation in the time step, a nonlinear second-order PDE in $V$ . Every result that holds for the discrete dynamic-programming equation — optimal substructure, the policy read off from the Bellman operator, the contraction-mapping convergence of value iteration — has a continuous-time analog phrased in HJB language.

HJB is also the canonical fully nonlinear parabolic PDE that arises from probability. The nonlinearity sits inside an infimum (or supremum) over the control variable, and that infimum is exactly the Hamiltonian of the problem. The Feynman–Kac formula handles the linear case (no control, drift fixed); the BSDE machinery of Pardoux and Peng (1992) extends to semilinear PDEs; HJB sits at the top of this hierarchy as the fully nonlinear PDE that BSDEs with a control-dependent driver solve in the most general setting.

In modern ML, HJB is the equation that continuous-time reinforcement learning, optimal stopping, optimal execution in finance, and robotic control problems all reduce to. The grid-based curse of dimensionality makes classical HJB solvers useless above $d \approx 6$ , which is the entire reason the Deep BSDE method and DGM / PINN-style PDE solvers exist: to approximate $V$ in regimes where finite differences cannot.

A useful slogan: Fokker–Planck moves densities forward, Feynman–Kac moves linear value functions backward, HJB moves optimized value functions backward. The three together cover the standard PDE-SDE dictionary for stochastic control.

Mental Model

The principle of optimality says: an optimal trajectory from $(t, x)$ is also optimal on every sub-interval $[s, T]$ for $s > t$ , given the state $X_s$ reached at time $s$ . Apply this principle infinitesimally. For a small time step $dt$ , the optimal cost from $(t, x)$ equals the running cost incurred over $[t, t + dt]$ plus the value at $(t + dt, X_{t + dt})$ , minimized over the control choice on that interval. Taylor-expand both sides in $dt$ , take the limit $dt \to 0$ , and the result is a PDE: the infimum over controls of (running cost plus generator of $V$ ) equals $-\partial_t V$ . That PDE is HJB.

The supremum (or infimum) over controls in HJB is the dynamic-programming analog of the $\max_a$ in the discrete Bellman operator. Reading off the control that achieves the infimum gives the optimal feedback policy $u^*(t, x)$ .

Formal Statement

Definition

Hamilton–Jacobi–Bellman Equation $\partial_{t} V + in f_{u} {b (x, u) \cdot \nabla V + \frac{1}{2} tr (σ σ^{⊤} (x, u) \nabla^{2} V) + f (x, u)} = 0, V (T, x) = g (x)$

Fix a horizon $T > 0$ and an admissible control set $U \subseteq \mathbb{R}^m$ . Consider the controlled SDE $dX_s = b(X_s, u_s)\,ds + \sigma(X_s, u_s)\,dB_s$ on $[t, T]$ with $X_t = x$ , where $u: [0, T] \to U$ is a progressively measurable control process. The cost functional is

J(t, x; u) = \mathbb{E}\!\left[\int_t^T f(X_s, u_s)\,ds + g(X_T) \,\Big|\, X_t = x\right],

with running cost $f$ and terminal cost $g$ . The value function is $V(t, x) = \inf_u J(t, x; u)$ , where the infimum is over admissible controls.

Under regularity, $V \in C^{1,2}([0, T) \times \mathbb{R}^d) \cap C([0, T] \times \mathbb{R}^d)$ satisfies the HJB equation

\partial_t V(t, x) + \inf_{u \in U}\!\left\{ b(x, u)\cdot \nabla V(t, x) + \tfrac{1}{2}\,\operatorname{tr}\!\big(\sigma \sigma^\top(x, u)\, \nabla^2 V(t, x)\big) + f(x, u) \right\} = 0,

with terminal condition $V(T, x) = g(x)$ . The bracketed expression is the Hamiltonian $H(x, p, M, u)$ evaluated at $p = \nabla V$ , $M = \nabla^2 V$ . Maximization (rather than minimization) gives the same PDE with $\sup_u$ replacing $\inf_u$ , used in reward-maximizing formulations.

The equation is fully nonlinear: the infimum over $u$ couples drift, diffusion, and running cost in a way that is not affine in $\nabla V$ or $\nabla^2 V$ . This is what distinguishes HJB from the linear backward Kolmogorov equation that Feynman–Kac inverts.

The Verification Theorem

The HJB equation is necessary for the value function under regularity. The verification theorem is the converse: a smooth solution of HJB whose infimum is attained by a measurable feedback control $u^*(t, x)$ is the value function, and $u^*$ is optimal. This is the workhorse result that turns "find $V$ satisfying a PDE" into "you have just solved the control problem."

Theorem

HJB Verification Theorem

Statement

Under the assumptions above, $W(t, x) = V(t, x)$ for all $(t, x) \in [0, T] \times \mathbb{R}^d$ , and the feedback control $u^*_s = u^*(s, X^*_s)$ , where $X^*$ solves the closed-loop SDE $dX^*_s = b(X^*_s, u^*(s, X^*_s))\,ds + \sigma(X^*_s, u^*(s, X^*_s))\,dB_s$ with $X^*_t = x$ , is optimal: $J(t, x; u^*) = V(t, x)$ .

Intuition

Apply Itô's formula to $W(s, X_s)$ along an arbitrary admissible control $u$ on $[t, T]$ . The drift of $W(s, X_s)$ is $\partial_s W + b \cdot \nabla W + \tfrac{1}{2}\operatorname{tr}(\sigma \sigma^\top \nabla^2 W)$ , which the HJB inequality bounds below by $-f(X_s, u_s)$ for every choice of $u$ . Integrating gives $W(t, x) \le J(t, x; u)$ , so $W \le V$ . For the optimal feedback $u^*$ , the HJB equation holds with equality and the bound becomes tight, giving $W = V$ .

Proof Sketch

For arbitrary admissible $u$ , apply Itô to $W(s, X_s)$ on $[t, T]$ :

W(T, X_T) - W(t, X_t) = \int_t^T \!\!\big(\partial_s W + b(X_s, u_s) \cdot \nabla W + \tfrac{1}{2}\operatorname{tr}(\sigma \sigma^\top \nabla^2 W)\big)(s, X_s)\,ds + \int_t^T (\nabla W)^\top \sigma\,dB_s.

The HJB equation gives $\partial_s W + b \cdot \nabla W + \tfrac{1}{2} \operatorname{tr}(\sigma \sigma^\top \nabla^2 W) + f \ge 0$ pointwise (since the infimum over $u$ is the smallest value), with equality at $u = u^*(s, X_s)$ . Take expectations; the stochastic integral is a martingale (polynomial growth plus BDG), so

\mathbb{E}[g(X_T)] - W(t, x) \ge -\mathbb{E}\!\int_t^T f(X_s, u_s)\,ds,

which rearranges to $W(t, x) \le J(t, x; u)$ . Equality holds along $u^*$ , so $W = V$ and $u^*$ achieves the infimum.

Why It Matters

This is the bridge between PDE analysis and control. Solve the HJB PDE analytically or numerically; read the optimal feedback $u^*(t, x)$ off the argmin in the Hamiltonian; the resulting closed-loop SDE is guaranteed optimal among all admissible controls. Without verification, the HJB equation would just be a necessary condition and you would still need a separate optimality proof; with verification, the PDE is the optimality certificate. $W = V$ (the value function), and $u^*(t, X_t)$ is an optimal control.

Failure Mode

The smoothness assumption $W \in C^{1,2}$ fails for many problems of practical interest: optimal stopping (where $V$ has a free boundary and $\nabla^2 V$ jumps), singular control, problems with state constraints, and degenerate diffusions where $\sigma \sigma^\top$ is rank-deficient. In all these cases the classical verification theorem does not apply directly, and one needs viscosity solutions or a regularization argument. A second failure mode: the infimum may not be attained inside $U$ (e.g., if $U$ is open or unbounded), in which case the candidate feedback $u^*(t, x)$ is undefined and the closed-loop SDE has no strong solution.

report a correction →

Viscosity Solutions

For most realistic stochastic control problems the value function is not $C^{1,2}$ and the classical verification theorem does not apply. The right notion of "solution" is the viscosity solution of Crandall and Lions (1983), extended to second-order PDEs by Crandall, Ishii, and Lions (1992).

The idea: replace pointwise differentiation of $V$ with a test-function inequality. $V$ is a viscosity sub-solution if and only if, for every smooth test function $\varphi$ with $V - \varphi$ attaining a local maximum at $(t_0, x_0)$ , the HJB operator applied to $\varphi$ is non-positive at $(t_0, x_0)$ . Super-solution is the dual inequality. A viscosity solution is both. This sidesteps the need for $V$ to be twice differentiable: the test function carries the derivatives, and the inequality only constrains $V$ at points where smooth functions can "touch" it.

Two facts make this framework load-bearing for HJB. First, under mild assumptions (continuity of $b, \sigma, f, g$ , polynomial growth) the value function $V$ is the unique continuous viscosity solution of HJB. Second, viscosity solutions are stable under uniform convergence, so numerical schemes that approximate the operator (monotone finite differences, semi-Lagrangian methods, BSDE schemes) converge to the viscosity solution under Barles–Souganidis-style consistency conditions. Ishii's lemma is the key technical tool for the comparison principle that gives uniqueness.

Connection to Feynman–Kac and BSDEs

Strip the control out of HJB. With $b$ and $\sigma$ fixed and the infimum dropped, the equation becomes the linear backward PDE

\partial_t V + b \cdot \nabla V + \tfrac{1}{2}\operatorname{tr}(\sigma \sigma^\top \nabla^2 V) + f(x) = 0, \quad V(T, x) = g(x),

which is exactly what the Feynman–Kac formula inverts: $V(t, x) = \mathbb{E}[g(X_T) + \int_t^T f(X_s)\,ds \mid X_t = x]$ . So the linear, no-control HJB is Feynman–Kac. The expectation representation is the value function of the trivial control problem with no decisions to make.

Restore the control, and the running cost becomes nonlinear in $(\nabla V, \nabla^2 V)$ through the infimum. The exact correspondence depends on whether the control enters the diffusion, and this distinction is essential.

Semilinear case (control affects drift only). When the diffusion $\sigma$ does not depend on $u$ , the HJB PDE is semilinear: it is nonlinear in $\nabla V$ but the second-order term $\tfrac{1}{2}\operatorname{tr}(\sigma\sigma^\top \nabla^2 V)$ is fixed and linear in $\nabla^2 V$ . The nonlinear Feynman–Kac formula of Pardoux and Peng (1992) gives the representation $V(t, x) = Y_t$ where $(Y_t, Z_t)$ solve a standard backward SDE

Y_t = g(X_T) + \int_t^T H^*(X_s, Z_s)\,ds - \int_t^T Z_s^\top dB_s,

with driver $H^*(x, z) = \inf_u \{f(x, u) + b(x, u) \cdot \sigma^{-\top} z\}$ . The BSDE pair $(Y, Z)$ encodes both $V$ and $\sigma^\top \nabla V$ along sample paths of $X$ . This is the regime in which classical BSDE theory and the Deep BSDE method operate, and it covers a large class of stochastic-control problems (LQG with control-independent noise, Merton portfolio, optimal execution with fixed liquidity) but not every HJB equation.

Fully nonlinear case (control enters the diffusion). When $\sigma$ depends on $u$ , the infimum couples $\nabla V$ and $\nabla^2 V$ nonlinearly through both the drift and the diffusion term, and HJB becomes fully nonlinear. Standard BSDEs of Pardoux–Peng type are not sufficient: the natural stochastic representation is the second-order BSDE (2BSDE) framework of Cheridito, Soner, Touzi, and Victoir (2007) and Soner, Touzi, and Zhang (2012), which adds a process $\Gamma_t$ encoding $\sigma\sigma^\top \nabla^2 V$ and is formulated under a non-dominated family of measures. Equivalently, fully nonlinear HJB is handled at the PDE level via viscosity solutions (Crandall, Ishii, Lions 1992). The Deep BSDE method extends to this regime via deep 2BSDE schemes (Beck, E, Jentzen 2019), but the vanilla BSDE story quoted above is the semilinear special case. Reading "BSDEs solve HJB" without the semilinear / fully nonlinear caveat is one of the most common slips in this area.

Worked Example: Linear-Quadratic-Gaussian Control

Take linear dynamics, quadratic cost, additive Gaussian noise:

dX_s = (A X_s + B u_s)\,ds + \sigma\,dB_s, \quad J = \mathbb{E}\!\left[\int_t^T (X_s^\top Q X_s + u_s^\top R u_s)\,ds + X_T^\top S X_T\right],

with $Q, S$ symmetric positive semidefinite, $R$ symmetric positive definite, and $\sigma$ a constant matrix (control-independent diffusion). Guess the value function is quadratic in $x$ : $V(t, x) = x^\top P(t) x + r(t)$ for some matrix $P(t)$ and scalar $r(t)$ to be determined.

Compute $\nabla V = 2 P(t) x$ and $\nabla^2 V = 2 P(t)$ . Substitute into HJB:

x^\top \dot P x + \dot r + \inf_{u}\!\big\{2 x^\top P (A x + B u) + \operatorname{tr}(\sigma \sigma^\top P) + x^\top Q x + u^\top R u\big\} = 0.

The infimum is unconstrained quadratic in $u$ ; setting the gradient to zero gives $u^* = -R^{-1} B^\top P x$ , a linear feedback of the state. Plug back in:

x^\top \dot P x + \dot r + 2 x^\top P A x - x^\top P B R^{-1} B^\top P x + \operatorname{tr}(\sigma \sigma^\top P) + x^\top Q x = 0.

Symmetrizing $2 x^\top P A x = x^\top (P A + A^\top P) x$ and matching the $x^\top (\cdot) x$ and constant terms separately gives the matrix Riccati ODE

\dot P(t) + P A + A^\top P - P B R^{-1} B^\top P + Q = 0, \quad P(T) = S,

and $\dot r(t) + \operatorname{tr}(\sigma \sigma^\top P(t)) = 0$ , $r(T) = 0$ . The Riccati equation is what the HJB PDE collapses to under the LQG ansatz: a finite-dimensional ODE in the matrix $P(t)$ , solvable by standard ODE integrators in any dimension where you can store $P$ .

Two consequences worth flagging. First, the optimal control is linear in the state with gain $K(t) = R^{-1} B^\top P(t)$ — this is the classical LQR result, and it is the reason LQG / iLQR / DDP underlie so much of model-based RL and trajectory optimization. Second, the noise $\sigma$ enters $V$ only through the additive scalar $r(t)$ and not through $P(t)$ : the optimal feedback is certainty-equivalent — solve the deterministic problem, ignore the noise, and you get the same controller. Certainty equivalence is special to LQG and breaks immediately when $R$ , $Q$ , or the dynamics depend on $u$ multiplicatively or when costs are not quadratic.

Common Confusions

Watch Out

HJB is for the value function, not the optimal policy directly

The equation solves for $V(t, x)$ . The optimal feedback $u^*(t, x)$ is read off as the argmin (or argmax) inside the Hamiltonian: $u^*(t, x) = \operatorname*{argmin}_u \{b(x, u) \cdot \nabla V + \tfrac{1}{2}\operatorname{tr}(\sigma \sigma^\top \nabla^2 V) + f(x, u)\}$ . You cannot solve for $u^*$ without first having $V$ (or a parametric guess for $V$ , as in the LQG example), which is why "policy iteration in continuous time" alternates between solving a linear PDE for $V$ given $u$ and updating $u$ from the argmin. The HJB equation itself is the fixed point of this alternation.

Watch Out

HJB runs backward in time; Fokker–Planck runs forward

HJB has a terminal condition $V(T, x) = g(x)$ and is solved backward from $t = T$ to $t = 0$ . Its dual, Fokker–Planck, has an initial condition $p(0, x) = p_0(x)$ and is solved forward from $t = 0$ to $t = T$ . They use the generator $\mathcal{L}$ and its adjoint $\mathcal{L}^*$ respectively. Confusing the time direction is a common implementation bug: the Euler-step update for HJB has the opposite sign on the time derivative compared to forward parabolic solvers.

Watch Out

Classical HJB grid solvers blow up exponentially in dimension

A finite-difference grid for $V(t, x)$ with $n$ points per axis costs $n^d$ memory; for $d = 100$ this is hopeless. This is the entire motivation for Deep BSDE, DGM, PINNs in control, and policy-gradient methods in continuous-time RL: they sidestep the grid by sampling $X$ trajectories (Monte Carlo, polynomial in $d$ ) and parameterizing $V$ or $\nabla V$ with neural networks. The trade-off is approximation error in $V$ versus exponential blow-up in storage; for $d \gtrsim 6$ the trade-off favors approximation every time.

Exercises

ExerciseCore

Problem

Specialize the LQG worked example to the scalar case: $dX_s = (a X_s + b u_s)\,ds + \sigma\,dB_s$ with running cost $q X_s^2 + r u_s^2$ and terminal cost $s X_T^2$ , all coefficients positive scalars. Derive the scalar Riccati ODE for $P(t)$ and the optimal feedback $u^*(t, x)$ from HJB by direct substitution.

ExerciseAdvanced

Problem

Show that the HJB equation reduces to the linear backward PDE that Feynman–Kac inverts when there is no control: take $U = \{u_0\}$ a single point, drift $b(x, u_0) = b_0(x)$ , diffusion $\sigma(x, u_0) = \sigma_0(x)$ , running cost $f(x, u_0) = f_0(x)$ , and verify that the Feynman–Kac representation $V(t, x) = \mathbb{E}[g(X_T) + \int_t^T f_0(X_s)\,ds \mid X_t = x]$ recovers exactly the value function defined by the cost integral.

References

Bellman, Dynamic Programming (Princeton University Press, 1957). The original source for the principle of optimality and the discrete Bellman equation that HJB continuous-time-extends.
Fleming and Rishel, Deterministic and Stochastic Optimal Control (Springer, 1975). The foundational rigorous treatment; Chapters 5–6 cover the stochastic HJB equation and the verification theorem under classical smoothness.
Fleming and Soner, Controlled Markov Processes and Viscosity Solutions (2nd ed., Springer, 2006), Chapters 4–5. The standard modern reference for HJB with full viscosity-solution machinery, including comparison principles and Ishii's lemma.
Yong and Zhou, Stochastic Controls: Hamiltonian Systems and HJB Equations (Springer, 1999), Chapters 4–5. Self-contained derivation of HJB from dynamic programming, verification theorem, and connections to the stochastic maximum principle.
Pham, Continuous-time Stochastic Control and Optimization with Financial Applications (Springer, 2009). Finance-flavored treatment with worked LQG, Merton portfolio, and optimal stopping via HJB variational inequalities.
Crandall and Lions, Viscosity solutions of Hamilton–Jacobi equations (Transactions of the AMS 277, 1983). The original viscosity-solution paper for first-order HJ; the extension to second-order HJB is in Crandall, Ishii, and Lions, "User's guide to viscosity solutions" (Bulletin of the AMS 27, 1992).
Pardoux and Peng, Backward stochastic differential equations and quasilinear parabolic partial differential equations (Lecture Notes in Control and Information Sciences 176, 1992). The nonlinear Feynman–Kac formula that connects BSDEs to semilinear PDEs and underlies the BSDE route to HJB.
Han, Jentzen, and E, Solving high-dimensional partial differential equations using deep learning (PNAS 115, 2018). The Deep BSDE method, demonstrated on $d = 100$ HJB and Allen–Cahn problems where grid solvers fail.

Next Topics

Deep BSDE Method: neural-network solver for high-dimensional HJB via the BSDE reformulation.
Backward SDE Theory: the Pardoux–Peng nonlinear Feynman–Kac formula that bridges BSDEs and semilinear PDEs.
Feynman–Kac Formula: the linear special case of HJB, recovered when there is no control.
Stochastic Differential Equations: the controlled diffusion that underlies the cost functional.
Fokker–Planck Equation: the forward / density-side dual of the backward HJB equation.

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

2

Feynman–Kac Formulalayer 3 · tier 2
Stochastic Differential Equationslayer 3 · tier 2

Derived topics

1

Backward Stochastic Differential Equationslayer 3 · tier 2

Graph-backed continuations

Backward Stochastic Differential Equations