Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Scientific ML

Physics-Informed Neural Networks

Embedding PDE constraints directly into the neural network loss function via automatic differentiation. When physics-informed learning works, when it fails, and what alternatives exist.

AdvancedTier 2Current~55 min

Prerequisites

0

Why This Matters

Most of science is governed by partial differential equations (PDEs). Classical numerical solvers (finite elements, finite differences, spectral methods) work well but struggle with high-dimensional problems, complex geometries, and inverse problems. PINNs propose a different approach: use a neural network as a function approximator and enforce the PDE through the loss function.

The idea is seductive. The reality is more nuanced. PINNs work well in certain regimes and fail badly in others. Understanding where the boundary lies is critical for anyone applying ML to scientific problems.

Mental Model

A standard neural network learns from data alone. A PINN adds a second source of supervision: the governing equations. You do not need as much data because the physics constrains the space of acceptable solutions.

Think of it as regularization by physical law. Instead of an L2 penalty that pushes weights toward zero, you have a PDE residual penalty that pushes the solution toward physical consistency.

Core Definitions

Definition

Physics-Informed Neural Network (PINN)

A neural network uθ(x,t)u_\theta(x, t) trained to approximate the solution of a PDE by minimizing a composite loss that includes both data fidelity and PDE residual terms. The PDE residual is computed via automatic differentiation of the network output with respect to its inputs.

Definition

PDE Residual

For a PDE of the form N[u]=0\mathcal{N}[u] = 0 where N\mathcal{N} is a differential operator, the PDE residual at a collocation point xx is:

R[uθ](x)=N[uθ](x)\mathcal{R}[u_\theta](x) = \mathcal{N}[u_\theta](x)

This is computed by differentiating uθu_\theta with respect to its inputs using automatic differentiation. A perfect solution has zero residual everywhere.

The PINN Loss Function

The central construct of PINNs is the composite loss:

Proposition

PINN Loss Decomposition

Statement

The PINN loss is:

L(θ)=λdataLdata+λpdeLpde+λbcLbc+λicLic\mathcal{L}(\theta) = \lambda_{\text{data}} \mathcal{L}_{\text{data}} + \lambda_{\text{pde}} \mathcal{L}_{\text{pde}} + \lambda_{\text{bc}} \mathcal{L}_{\text{bc}} + \lambda_{\text{ic}} \mathcal{L}_{\text{ic}}

where:

  • Ldata=1Ndi=1Nduθ(xi)uiobs2\mathcal{L}_{\text{data}} = \frac{1}{N_d} \sum_{i=1}^{N_d} |u_\theta(x_i) - u_i^{\text{obs}}|^2 (data fidelity)
  • Lpde=1Nrj=1NrN[uθ](xj)2\mathcal{L}_{\text{pde}} = \frac{1}{N_r} \sum_{j=1}^{N_r} |\mathcal{N}[u_\theta](x_j)|^2 (PDE residual at collocation points)
  • Lbc\mathcal{L}_{\text{bc}} and Lic\mathcal{L}_{\text{ic}} enforce boundary and initial conditions

The weights λ\lambda balance the different loss terms.

Intuition

The network is pulled in multiple directions: fit the observed data, satisfy the PDE everywhere in the domain, and respect boundary/initial conditions. The physics term acts as an infinite-dimensional regularizer, constraining the solution to the manifold of physically plausible functions even where no data exists.

Proof Sketch

There is no convergence "proof" in the classical sense for general PINNs. Theoretical results (Shin, Darbon, Karniadakis 2020) show that as the number of collocation points and network capacity grow, minimizers of the PINN loss converge to the PDE solution under regularity assumptions. The rate of convergence is generally worse than classical solvers for smooth problems.

Why It Matters

This decomposition is the entire PINN methodology. The key engineering decisions are: (1) the architecture of uθu_\theta, (2) the placement of collocation points, (3) the relative weights λ\lambda, and (4) the optimizer and training schedule. Getting these wrong leads to solutions that satisfy neither the data nor the physics.

Failure Mode

The multi-objective nature of the loss creates optimization difficulties. The PDE residual and data terms can have vastly different scales and gradients, leading to one dominating the other. Adaptive weighting schemes (e.g., learning rate annealing, NTK-based weighting) partially address this but do not fully solve it.

How PINNs Use Automatic Differentiation

The key enabling technology is automatic differentiation (autodiff). To compute N[uθ]\mathcal{N}[u_\theta], you differentiate the neural network output with respect to its inputs (not its parameters). Modern frameworks (PyTorch, JAX) compute these derivatives exactly and efficiently.

For example, if the PDE is the heat equation tu=α2u\partial_t u = \alpha \nabla^2 u, the residual at a point (x,t)(x, t) is:

R=uθtα2uθx2\mathcal{R} = \frac{\partial u_\theta}{\partial t} - \alpha \frac{\partial^2 u_\theta}{\partial x^2}

Both derivatives are computed by autodiff through the network graph.

When PINNs Work

Example

Smooth Solutions to Known PDEs

PINNs perform well on problems with smooth solutions where the governing PDE is known exactly. Classic demonstrations include the Burgers equation (before shock formation), the Schrodinger equation, and steady-state heat conduction. In these settings, the physics loss provides strong regularization, and the network can represent the solution accurately with moderate capacity.

Example

Inverse Problems

PINNs are particularly attractive for inverse problems: given sparse noisy observations, infer unknown PDE parameters. For example, estimating the diffusion coefficient from temperature measurements. The physics constraint makes the inverse problem well-posed even with limited data.

When PINNs Fail

Watch Out

PINNs struggle with discontinuities and sharp gradients

Shock waves, contact discontinuities, and thin boundary layers cause PINNs to fail or converge extremely slowly. Neural networks are smooth function approximators (especially with tanh or sigmoid activation functions), so representing discontinuities requires enormous capacity. The PDE residual near a shock is large and noisy, destabilizing training. Classical shock-capturing schemes (WENO, Godunov) handle these cases far better.

Watch Out

PINNs can be slower than classical solvers for forward problems

For a well-posed forward PDE with known coefficients, a finite element solver with adaptive meshing is typically faster and more accurate than a PINN. PINNs pay the cost of neural network training (thousands of gradient descent steps) for a single PDE instance. Classical solvers amortize their cost better. PINNs become competitive when you need to solve many related PDEs (amortized inference) or when classical meshing is impractical. The multi-objective loss also connects to gradient descent variants and adaptive learning rate scheduling.

Watch Out

Stiff systems cause training failure

PDEs with multiple widely separated time scales (stiff systems) create loss landscapes with pathological curvature. The fast dynamics produce large PDE residuals early in training, while the slow dynamics require long training to resolve. Multiscale architectures and time-stepping strategies help but do not eliminate the problem.

Extensions

Biologically-Informed Neural Networks (BINNs)

BINNs apply the PINN framework to biological systems where the governing equations are partially known. Instead of enforcing a fully specified PDE, BINNs learn unknown terms in the equations from data while enforcing known structural constraints (conservation laws, positivity, symmetries).

Neural Operators: A Different Paradigm

Instead of learning a single PDE solution, neural operators learn the solution operator: a mapping from PDE parameters, initial conditions, or forcing terms to solutions.

Fourier Neural Operator (FNO): learns in the frequency domain, applying learned filters to the Fourier coefficients of the input function. Resolution invariant and fast at inference (one forward pass per new PDE instance).

DeepONet: uses a branch-trunk architecture. The branch network encodes the input function, the trunk network encodes the evaluation point, and their dot product gives the solution value. Theoretically grounded in the universal approximation theorem for operators (Chen & Chen, 1995).

Neural operators amortize the cost of training over many PDE instances. Once trained, solving a new PDE instance requires only a forward pass, not retraining. This makes them far more practical than PINNs for applications requiring repeated solves (design optimization, uncertainty quantification).

Summary

  • PINN loss = data fidelity + PDE residual + boundary/initial conditions
  • Autodiff computes exact spatial/temporal derivatives of the network
  • PINNs work best for smooth solutions, inverse problems, and data-sparse regimes
  • PINNs fail on discontinuities, stiff systems, and problems where classical solvers already excel
  • Loss balancing between physics and data terms is a critical engineering challenge
  • Neural operators (FNO, DeepONet) learn solution operators and amortize cost across PDE instances

Exercises

ExerciseCore

Problem

Write the PINN loss for the 1D heat equation tu=αxxu\partial_t u = \alpha \partial_{xx} u on x[0,1]x \in [0, 1], t[0,T]t \in [0, T], with initial condition u(x,0)=f(x)u(x, 0) = f(x) and boundary conditions u(0,t)=u(1,t)=0u(0, t) = u(1, t) = 0. Identify all collocation point sets needed.

ExerciseAdvanced

Problem

Why do PINNs struggle with the Burgers equation tu+uxu=νxxu\partial_t u + u \partial_x u = \nu \partial_{xx} u at small viscosity ν\nu? What happens to the PDE residual near a shock?

References

Foundational:

  • Raissi, Perdikaris, Karniadakis, "Physics-informed neural networks," Journal of Computational Physics (2019)

Theory:

  • Shin, Darbon, Karniadakis, "On the convergence of PINNs" (2020)

Neural Operators:

  • Li et al., "Fourier Neural Operator for parametric PDEs," ICLR (2021)
  • Lu, Jin, Pang, Zhang, Karniadakis, "DeepONet," Nature Machine Intelligence (2021)

Survey:

  • Karniadakis et al., "Physics-informed machine learning," Nature Reviews Physics (2021)

Next Topics

Explore neural operators and scientific ML applications as this rapidly evolving field develops.

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.