Mathematical Infrastructure
PDE Fundamentals for Machine Learning
The partial differential equations that appear in modern machine learning: heat and Fokker-Planck for diffusion, continuity for flow matching, Hamilton-Jacobi-Bellman for reinforcement learning, Poisson for score matching. Classification, solution concepts, and where ML actually needs PDE theory versus where it just uses the vocabulary.
Why This Matters
▸What is this?
A live solver for four classical partial differential equations, running entirely in your browser. The left pane is the solution . The right pane is its Fourier spectrum on a log scale. Both update together as you drag the time slider, switch PDE modes, or paint on the field.
What makes it unusual: instead of stepping time forward in tiny numerical increments (the standard PDE-solver approach), this evaluates the exact closed-form solution at any time via a single Fourier multiplier and one inverse FFT per frame. That is why you can scrub time freely, including backwards (scrubbing past in heat mode reproduces the ill-posedness that motivates DDPM and the score models that learn ).
Try the guided tour button in the top-right to watch all four modes and the live neural-net fit without touching anything. Or click ▶ and start painting heat onto the field.
- Gaussian blur with radius is this simulation at . Scale space in computer vision (Witkin 1983) is the heat equation in disguise.
- The DDPM / score-SDE forward process diffuses the data density . Every Fourier mode decays as — high-frequency detail dies first. Watch the letter A collapse into a blob. This is what Gaussian noise does to real images.
- Scrub time backward. Each mode is now multiplied by , and high- modes blow up exponentially. This is why naive reverse diffusion is ill-posed and a learned score is required to stay on the data manifold (Anderson 1982; Song 2021).
- A Fourier Neural Operator layer is this solver with the radial multiplier learned rather than closed-form (Li et al. 2021). The attenuation visible in the pane is the target those layers learn to reproduce.
The field on the periodic square , rendered directly from the current solution. Paint heat with click-drag; shift-drag erases. Edges wrap around.
The log-magnitude Fourier spectrum , with DC centered so sits at the middle. Rings mark .
- Scrub time past 0 into negative values
- Switch to mode IC and slide
- Paint, then press ▶
A tiny neural network (two hidden layers, sine activations) trains live to match the exact solver's output. Press ▶ train and watch the prediction improve, the error heatmap shrink, and the loss curve decay on a log axis.
The point: exact solvers give machine-precision answers in microseconds; a learned approximator converges slowly to a few-percent RMS and stops there — limited by its ~1,200 parameters, not by compute. This is the honest baseline neural PDE methods fight against on easy problems, and the reason they only win on inverse problems or high-dimensional domains where classical solvers can't go.
This demo fits a supervised target. A true PINN (Raissi 2019) swaps the target for the PDE residual and never sees ground truth; it needs autograd for second derivatives, which is heavier than what runs here. The PINN failure modes appear later on this page.
Spectral 2D solver · 128² grid · radix-2 FFT · closed-form propagator. Space = play/pause · R = reset · 1–4 = mode · C = contours · 🔗 share permalinks.
Robby Sneiderman · @Robby955 · MIT·view source·report an issue
Most of modern generative ML and scientific ML is quietly about partial differential equations. A diffusion model is a time-discretized simulation of a Fokker-Planck equation run backward. A flow matching model learns the velocity field of a continuity equation. A value function in reinforcement learning satisfies a Hamilton-Jacobi-Bellman equation. A score network learns , which is itself a gradient of a Poisson-like object. The PINN literature and the neural operator literature (Fourier Neural Operator, DeepONet) are explicit about their PDE roots; the generative modeling literature often is not.
This page assembles the PDE material that recurs across these systems. It is not a PDE course. It is a reference that names the objects, states what they guarantee and what they do not, and points to the exact place where each PDE shows up inside a working ML model.
Mental Model
A PDE is a local constraint on an unknown function. Given on a domain , a PDE of order is an equation
together with boundary and initial conditions that pin down which solution is meant. The content of PDE theory is twofold: which functions satisfy the equation (existence, regularity), and how the solution depends on the data (stability, uniqueness). For machine learning purposes, the second question matters more, because ML systems either learn the solution map directly (neural operators), or learn the velocity/score field whose existence is guaranteed by a classical PDE theorem.
The operational shift in ML is that we rarely write down and solve a PDE. We write down an ML loss whose minimizer, at population level, is exactly the classical solution or its velocity/score representation. The PDE is then a sanity check on what the loss is asking for and on what failures of the model imply about the learned object.
Classification: The Three Archetypes
Linear second-order PDEs on with constant coefficients take the form
Let be the symbol matrix. Assume is symmetric (always achievable after symmetrization).
Classification of Second-Order Linear PDEs
Statement
A second-order linear PDE with symbol is:
- Elliptic if is definite (all eigenvalues nonzero and of the same sign). Archetype: Laplace equation .
- Parabolic if is degenerate with one zero eigenvalue and the remaining of the same sign, and the first-order term in the missing direction is nonzero. Archetype: heat equation .
- Hyperbolic if is nondegenerate with eigenvalues of mixed sign. Archetype: wave equation .
Intuition
Elliptic PDEs describe equilibria: no preferred time direction, and the solution at any interior point is an average of its boundary data. Parabolic PDEs describe smoothing and diffusion: information propagates in one direction of time, and discontinuities are smoothed out. Hyperbolic PDEs describe wave-like propagation: information travels along characteristics at finite speed, and singularities persist.
Why It Matters
The archetype dictates what the solution can look like, which numerical methods are stable, and how ML should treat the problem. A PINN that works well for the heat equation can fail dramatically on the wave equation because gradient descent propagates information differently than the PDE does. Flow matching is effectively a first-order hyperbolic problem (a transport equation); diffusion model forward processes are parabolic. Mixing these up in an ML pipeline is a source of silent bugs.
Failure Mode
Most real PDEs in ML are nonlinear (Fokker-Planck with nonlinear drift, Hamilton-Jacobi-Bellman, Burgers') or first-order (continuity, transport), and do not fit cleanly into the linear-second-order classification. The classification is a ladder to climb onto, not a complete taxonomy. Viscosity solutions (Crandall-Lions 1983) were invented precisely because nonlinear first-order PDEs need a weaker notion of solution.
| Archetype | Symbol matrix | Canonical PDE | Information flow | Typical behavior | ML counterpart |
|---|---|---|---|---|---|
| Elliptic | Definite (all eigenvalues same sign) | Instantaneous everywhere | Smoothing, boundary-averaged | Spectral clustering, Poisson solves, FNO benchmarks | |
| Parabolic | One zero eigenvalue, rest same sign | Forward in time only | Exponential smoothing, diffusive | Diffusion-model forward process, Gaussian blur | |
| Hyperbolic | Mixed-sign eigenvalues | Along characteristics at finite speed | Wave propagation, shock formation | Flow matching (first-order transport), Burgers' benchmarks |
Six PDEs That Matter for Machine Learning
1. Heat equation (parabolic)
The closed-form solution is convolution with a Gaussian kernel of variance :
Where it appears in ML. The forward pass of a continuous-time diffusion model with constant diffusion coefficient is the heat equation acting on the data density (Song et al. 2021, arXiv:2011.13456). The learned score is the gradient of the log-density of a heat-equation solution started at the data distribution. Gaussian smoothing in image processing is a single Euler step of the heat equation.
2. Fokker-Planck equation (parabolic, linear in the density)
Fokker-Planck Equation for an SDE
Statement
The density of the solution to with initial density satisfies the Fokker-Planck (or Kolmogorov forward) equation
with initial condition . Here the second term is shorthand for .
Intuition
Probability mass is conserved. The first term is pure transport of mass along the drift . The second term is pure diffusion at rate . Fokker-Planck is a continuity equation for a probability density with an added diffusive flux.
Proof Sketch
Apply Ito's lemma to a test function , take expectation, integrate by parts twice to move derivatives off and onto . The adjoint of the infinitesimal generator is the Fokker-Planck operator. See Pavliotis, Stochastic Processes and Applications (Springer 2014), Theorem 2.4.
Why It Matters
Every score-based diffusion model is implicitly solving Fokker-Planck forward to obtain the noised marginals and its time reverse to generate samples. The forward process is ; the learned network approximates , and plugging this into the reverse-time SDE gives generative sampling.
Failure Mode
Fokker-Planck requires enough regularity of for a density to exist. If is degenerate (a common case in RL, where control affects only some coordinates), the equation is hypoelliptic rather than elliptic in the spatial variables and needs Hörmander's theorem (1967) to guarantee a smooth density.
Anderson Reverse-Time SDE
Statement
The time reversal of solves
where is a Brownian motion in reversed time (Anderson 1982, Stochastic Processes and their Applications, 12(3), pp 313-326).
Intuition
Running an SDE backward in time is not trivial: you need a drift correction by to cancel the concentration of probability that the forward process produced. The corrected drift pushes mass back toward the data distribution.
Why It Matters
This is the generative sampling equation for score-based diffusion models. The trained network is ; substituting it into the Anderson SDE gives the reverse process used at inference. If the score is learned accurately, Anderson's theorem guarantees the reverse process produces samples from (Song et al. 2021).
Failure Mode
Score estimation is exact only in expectation. Score errors in low-density regions (far from the data manifold) compound over the reverse integration and produce off-manifold samples. This is the fundamental noise-schedule and sampler-design problem in diffusion models.
3. Continuity equation (first-order hyperbolic)
This is Fokker-Planck without the diffusion term. Given a velocity field , probability mass is pushed along the flow of without any smoothing.
Where it appears in ML. Flow matching (Lipman, Chen, Ben-Hamu, Nickel, Le 2023, arXiv:2210.02747), rectified flow, and continuous normalizing flows all train a network to satisfy a continuity equation from the data distribution to a tractable base (typically a standard Gaussian). At inference time, you solve the ODE deterministically, which is the method-of-characteristics solution of the continuity equation. Optimal transport via the Benamou-Brenier formulation (2000, Numerische Mathematik) is a constrained minimization over continuity-equation-compatible velocity fields.
4. Hamilton-Jacobi-Bellman equation (fully nonlinear, first-order)
For a continuous-time stochastic control problem with cost and terminal cost , the value function satisfies
Where it appears in ML. Every continuous-time reinforcement learning formulation reduces to an HJB equation under a sign flip. Control theory minimizes cost, so HJB has a ; RL maximizes reward, so the RL version has a and the value function changes sign accordingly. The discrete-time Bellman equation is the backward Euler discretization of the reward-form HJB with a sample-based replacement of the expectation. Entropy-regularized RL (soft Q-learning, soft actor-critic) replaces by the soft-max , giving a soft-HJB equation whose optimal policy has the Gibbs form .
The HJB equation is fully nonlinear and does not generally admit classical (differentiable) solutions. The right notion is Crandall-Lions viscosity solutions (1983, Transactions of the American Mathematical Society, 277(1), pp 1-42).
5. Poisson and Laplace equations (elliptic)
The fundamental solution of on is for (with and the surface area of the unit sphere) and for . Any sufficiently decaying solution of Poisson can be written as a convolution with ; the negative sign in and the positive sign in come from the sign of the Laplacian acting on these radially symmetric profiles and are not a typo.
Where it appears in ML. The discrete Laplace operator is the core object in spectral clustering, graph neural networks, and manifold learning: eigenvectors of the graph Laplacian approximate eigenfunctions of the Laplace-Beltrami operator on the underlying manifold (Belkin and Niyogi 2003, Neural Computation 15(6), pp 1373-1396). Poisson equations also arise directly in physics-based ML applications: solving for potential fields is the canonical linear PDE benchmark for FNOs and classical PINN tasks, because the exact solution operator is a translation-invariant convolution, which diagonal-in-Fourier methods match exactly.
6. Burgers' equation (nonlinear, parabolic limit of inviscid hyperbolic)
At this is inviscid Burgers, which develops shocks (discontinuities) in finite time even for smooth initial data. For the equation has a Cole-Hopf transformation to the heat equation.
Where it appears in ML. Burgers' is the standard benchmark problem for PINNs and neural operators because it exhibits shocks: a regime where neural PDE solvers either handle or dramatically fail to handle a nonlinear, singular feature. If a method cannot solve Burgers' at , it will not solve Navier-Stokes, which has the same nonlinear advection plus a coupling and a pressure constraint.
Notions of Solution
Classical solutions are differentiable enough to plug into the PDE pointwise. Real applied PDEs usually do not have classical solutions, and one of the weaker notions below is what the theory actually guarantees.
Weak Solution
A weak solution of on is a function in a Sobolev space for which
for every test function compactly supported in , where is the formal adjoint of . Derivatives are moved onto the smooth test function via integration by parts; is allowed to be nondifferentiable in the strong sense.
Viscosity Solution
A viscosity solution of a fully nonlinear first-order or second-order PDE is defined by Crandall-Lions 1983 test-function conditions that generalize the maximum principle. Intuitively, is a viscosity solution if for every smooth touching from above (resp. below) at a point , the PDE holds at with in place of with appropriate inequality. Viscosity solutions exist and are unique under mild assumptions for HJB and many nonlinear first-order PDEs where classical and weak theories both fail.
Distributional Solution
A distributional solution is a weak solution where the test space is Schwartz functions (or compactly supported smooth functions) and is interpreted as a distribution. This is the setting for PDEs with singular source terms or point masses, including Green's functions themselves.
| Solution notion | Regularity of | Test function space | Typical use case | Key reference |
|---|---|---|---|---|
| Classical | for -th order PDE | None required | Well-posed linear PDEs on smooth domains | Evans ch 2-4 |
| Weak | Sobolev, usually | or | Linear PDEs with rough data or singular geometry | Evans ch 5-6, Brezis ch 8-9 |
| Viscosity | Continuous (bounded) | Smooth test functions touching | Fully nonlinear first- and second-order PDEs, HJB | Crandall-Lions 1983 |
| Distributional | Distribution (possibly non-function) | Schwartz / | PDEs with singular sources, Green's functions, fundamental solutions | Hörmander, The Analysis of Linear Partial Differential Operators |
Where PDEs Embed in ML Systems
| ML system | PDE | Learned object | What the PDE guarantees |
|---|---|---|---|
| Score-based diffusion | Fokker-Planck forward + Anderson reverse | Score | Exact samples from data distribution if score is exact |
| Flow matching | Continuity equation | Velocity field | Deterministic coupling between base and data |
| Normalizing flows | Continuity equation | Invertible transformation | Exact log-likelihood via change of variables |
| Continuous-time RL | Hamilton-Jacobi-Bellman | Value function | Optimal policy from greedy choice with respect to |
| Score matching | Poisson-like (implicit) | Score | Log-density up to constant |
| PINNs | User-specified PDE | Solution | Physics-consistent interpolation if loss is zero |
| Neural operators (FNO, DeepONet) | Parametric family of PDEs | Solution map | Fast evaluation of the solution operator |
What Neural PDE Solvers Actually Buy You
Classical solvers (finite differences, finite elements, spectral) are the correct tool for well-posed PDEs in low dimension with simple geometry. They have rigorous error bounds, provable stability, and decades of engineering. Neural solvers are worth using when at least one of the following holds:
-
High dimension. Classical methods scale as grid points. For (common in stochastic control, Boltzmann equations, quantum chemistry) this is infeasible. Neural parameterizations can break the curse of dimensionality when the target has enough structure (Han, Jentzen, E 2018 on deep BSDE methods for HJB, PNAS 115(34)).
-
Parametric families. A FNO or DeepONet trained across a family of PDE coefficients can amortize the solution cost: inference is one forward pass instead of a full solve. Classical solvers have no analog; each new coefficient requires a new solve.
-
Inverse problems. Backing out unknown coefficients from observations of is a natural fit for autodiff. PINNs and neural operators can jointly fit data and residual (Raissi, Perdikaris, Karniadakis 2019, Journal of Computational Physics 378, pp 686-707).
-
Implicit access. If the PDE is only given through a simulator (Navier-Stokes with a specific turbulence model, molecular dynamics), classical PDE theory does not apply directly. ML can learn the coarse-grained map.
What neural solvers do not currently deliver: convergence guarantees at classical-solver rates, reliable performance on shock-dominated or highly multiscale problems, or competitive accuracy on standard 2D or 3D benchmarks where finite elements have been tuned for thirty years. The honest reading of the 2020-2025 literature is that neural PDE solvers extend reach into regimes classical methods cannot handle, rather than replacing classical solvers in their home regime.
Worked Example: The DDPM Forward Process Is the Heat Equation on the Data Density
A common point of confusion is how a stochastic process on samples relates to a deterministic PDE. The standard DDPM forward process adds Gaussian noise to each sample independently:
At the level of an individual trajectory, this is pure noise injection and has no PDE associated with it. But the density of does satisfy a PDE. In the continuous-time limit with variance schedule , the density obeys the Fokker-Planck equation for the forward SDE :
In the variance-exploding limit (no drift, pure noise injection), the first term drops and we are left with
which is exactly the heat equation with time-varying diffusivity. Take the spatial Fourier transform: every mode decays independently as
This is the equation visible in the interactive explorer above (set to recover the standard heat propagator). The high-frequency content of the data density is attenuated exponentially in . Running the process backward in time requires inverting this multiplier, which amplifies high- noise by the same exponential factor. No finite amount of data lets you recover that information without a prior; the learned score is exactly the object that pins the trajectory to the data manifold during reverse time (Anderson 1982; Song et al. 2021, ICLR).
The takeaway: DDPM training is not "noise prediction for its own sake." It is learning the Green's function of a parabolic PDE that you could, in principle, write in closed form on a trivial domain like but never on the manifold of natural images. The spectral structure visible in the explorer's Fourier pane is literally what every diffusion model internalizes during training.
Common Confusions
The Fokker-Planck equation is not the SDE
Fokker-Planck is a deterministic PDE for the density . The SDE is a stochastic equation for the trajectory . Both encode the same Markov process, but they are distinct mathematical objects. Score-based diffusion trains on trajectories (SDE view) and generates by integrating a reverse SDE, but the theoretical analysis is almost always stated in the density (Fokker-Planck) view.
A PINN is not a PDE solver in the classical sense
A PINN minimizes a composite loss that includes a PDE residual, evaluated at sampled collocation points. Minimizing the loss to zero on a finite point set does not imply the PDE holds pointwise, and no classical PINN formulation has error bounds that scale like finite element or spectral methods. Treat a PINN as a regularized regression with a physics-inspired penalty, not as a convergent numerical scheme.
Flow matching is deterministic; diffusion is stochastic
Flow matching trains a velocity field for a continuity equation. At inference you solve an ODE with no noise. Diffusion trains a score for a Fokker-Planck reverse SDE. At inference you integrate an SDE with injected noise. The two produce samples from the same distribution (when trained well) but have different variance properties and sampler trade-offs. The continuity equation and the Fokker-Planck equation are related: Fokker-Planck with zero diffusion is the continuity equation.
Viscosity solutions are not solutions that got smoothed
The name is historical: Crandall and Lions introduced the definition via a vanishing-viscosity argument ( added, ). The resulting notion of solution, however, is purely algebraic and does not require any actual diffusion. Viscosity solutions are the correct notion of solution for fully nonlinear first-order and second-order PDEs, including HJB. Nothing in the definition involves smoothing.
Neural operators do not learn PDEs; they learn solution maps
A Fourier Neural Operator (Li et al. 2021, arXiv:2010.08895) learns a mapping from PDE coefficients to PDE solutions, trained on a dataset of (coefficient, solution) pairs obtained by running a classical solver. It does not learn what a PDE is. Without the classical solver, there is no training data. The "neural" part accelerates repeated evaluation of an already-understood solution map; it does not replace the PDE model.
Summary
- PDEs are local constraints on functions. The three archetypes (elliptic, parabolic, hyperbolic) dictate what solutions look like and which numerical and ML methods are stable.
- Six PDEs recur in ML: heat, Fokker-Planck, continuity, Hamilton-Jacobi-Bellman, Poisson, Burgers'. Diffusion models, flow matching, RL, and score matching are each specific ML incarnations of one of these.
- Classical solutions are rarely available for real problems. Weak, viscosity, and distributional solutions are the right formal objects.
- Neural solvers extend PDE reach into high dimension, parametric families, inverse problems, and simulator-only settings, and do not compete with classical solvers in their home regime as of 2025.
- The correct mental model: an ML system that learns a score, a velocity, or a value is learning a specific field whose mathematical existence and meaning are given by a classical PDE theorem.
Exercises
Problem
Starting from the Ito SDE (overdamped Langevin dynamics with potential and inverse temperature ), write out the Fokker-Planck equation for the density and identify the stationary distribution.
Problem
The heat equation on with initial data has solution where is the Gaussian kernel of variance . Explain in what sense Gaussian smoothing of an image is a heat-equation simulation, and what "time" corresponds to in standard image-processing notation.
Problem
Starting from the forward SDE (the variance-preserving diffusion used in DDPM), derive the Anderson reverse-time SDE. Identify the drift correction that makes the reverse SDE generate samples from the initial density.
Problem
Consider training a Fourier Neural Operator on a parametric family of Poisson equations on with periodic boundary conditions, where is drawn from a Gaussian random field prior. Explain why the FNO's translation-invariant kernel parameterization is a particularly good fit for this problem class, and identify two concrete settings where that fit breaks down.
References
Canonical PDE texts:
- Evans, Partial Differential Equations (2nd ed., AMS 2010). Chapter 2 for the three archetypes; Chapter 5 for Sobolev spaces and weak solutions; Chapter 10 for Hamilton-Jacobi and viscosity solutions.
- Brezis, Functional Analysis, Sobolev Spaces and Partial Differential Equations (Springer 2011). Chapters 8-9 for Sobolev theory; Chapter 10 for evolution equations.
- Strauss, Partial Differential Equations: An Introduction (2nd ed., Wiley 2007). Chapters 1-5 for classification and the three archetypes at an introductory level.
Stochastic and Kolmogorov-forward theory:
- Pavliotis, Stochastic Processes and Applications: Diffusion Processes, the Fokker-Planck and Langevin Equations (Springer 2014). Chapter 2 for the Fokker-Planck derivation; Chapter 4 for stationary distributions; Chapter 6 for overdamped Langevin.
- Øksendal, Stochastic Differential Equations (6th ed., Springer 2003). Chapters 7-8 for the Kolmogorov forward and backward equations.
- Anderson, "Reverse-time diffusion equation models" (Stochastic Processes and their Applications, 12(3), pp 313-326, 1982). Primary source for the reverse-time SDE formula.
- Hörmander, "Hypoelliptic second order differential equations" (Acta Mathematica, 119, pp 147-171, 1967). For degenerate diffusion operators.
Viscosity solutions and HJB:
- Crandall, Lions, "Viscosity solutions of Hamilton-Jacobi equations" (Transactions of the American Mathematical Society, 277(1), pp 1-42, 1983). The defining paper.
- Fleming, Soner, Controlled Markov Processes and Viscosity Solutions (2nd ed., Springer 2006). Chapters 2-3 for HJB theory with applications to control.
Optimal transport and continuity equation:
- Villani, Topics in Optimal Transportation (AMS 2003). Chapters 1-2 for Monge-Kantorovich.
- Villani, Optimal Transport: Old and New (Springer 2008). Chapter 23 for Wasserstein geometry and gradient flows.
- Benamou, Brenier, "A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem" (Numerische Mathematik, 84(3), pp 375-393, 2000). Dynamical formulation of optimal transport as a constrained minimization over continuity-equation-compatible velocity fields.
Machine learning meets PDEs (current):
- Raissi, Perdikaris, Karniadakis, "Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations" (Journal of Computational Physics, 378, pp 686-707, 2019). The reference PINN paper.
- Li, Kovachki, Azizzadenesheli, Liu, Bhattacharya, Stuart, Anandkumar, "Fourier Neural Operator for Parametric Partial Differential Equations" (ICLR 2021, arXiv:2010.08895).
- Kovachki, Li, Liu, Azizzadenesheli, Bhattacharya, Stuart, Anandkumar, "Neural Operator: Learning Maps Between Function Spaces" (JMLR 24, 2023, arXiv:2108.08481). Unifying framework for FNO, DeepONet, and related architectures.
- Lu, Jin, Karniadakis, "DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators" (Nature Machine Intelligence 3, pp 218-229, 2021, arXiv:1910.03193).
- Song, Sohl-Dickstein, Kingma, Kumar, Ermon, Poole, "Score-Based Generative Modeling through Stochastic Differential Equations" (ICLR 2021, arXiv:2011.13456). Unifies score matching and Anderson reverse SDE as the generative framework.
- Lipman, Chen, Ben-Hamu, Nickel, Le, "Flow Matching for Generative Modeling" (ICLR 2023, arXiv:2210.02747). Training objective for continuity-equation velocity fields.
- Han, Jentzen, E, "Solving high-dimensional partial differential equations using deep learning" (PNAS, 115(34), pp 8505-8510, 2018). Deep BSDE method for HJB in high dimension.
- Karniadakis, Kevrekidis, Lu, Perdikaris, Wang, Yang, "Physics-informed machine learning" (Nature Reviews Physics, 3, pp 422-440, 2021). Overview of the PINN and neural operator landscape.
Next Topics
- Physics-informed neural networks: the direct application of this material to solving PDEs with neural loss functions.
- Diffusion models: score-based generative modeling, where Fokker-Planck and the Anderson reverse SDE are the operational equations.
- Flow matching: continuity-equation-based generative modeling with deterministic inference.
- Neural ODEs: continuous-depth networks, adjacent to the neural operator and PDE literature.
Last reviewed: April 18, 2026
Prerequisites
Foundations this topic depends on.
- Differentiation in RnLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Measure-Theoretic ProbabilityLayer 0B
- Functional Analysis CoreLayer 0B
- Metric Spaces, Convergence, and CompletenessLayer 0A
- Inner Product Spaces and OrthogonalityLayer 0A
- Vectors, Matrices, and Linear MapsLayer 0A