ML Methods
DeepONet
DeepONet (Lu, Karniadakis et al., 2021) approximates nonlinear operators between function spaces by splitting a network into a branch (encoding the input function at fixed sensors) and a trunk (encoding query coordinates), then taking an inner product. The architecture is the practical realization of Chen and Chen's 1995 universal approximation theorem for operators.
Prerequisites
Why This Matters
DeepONet (Lu, Jin, Pang, Zhang, and Karniadakis, Nature Machine Intelligence 2021) is the operator-learning architecture grounded in a theorem that predates it by 26 years: Chen and Chen's 1995 universal approximation theorem for nonlinear operators. The theorem says a single-hidden-layer neural network can approximate any continuous operator between function spaces, provided the input function is sampled at finitely many fixed sensor locations. DeepONet is the modern, deep-learning realization of that construction.
The architecture splits computation in two: a branch network encodes the discretized input function at sensors into a coefficient vector, and a trunk network encodes a query coordinate into a basis vector. Their inner product produces , the value of the target operator's output function at . This branch-trunk split is what gives DeepONet its theoretical interpretability: the trunk learns a basis for the output function space, the branch learns coefficients in that basis.
In the data-driven PDE landscape, DeepONet competes directly with the Fourier Neural Operator and adjacent variants (graph neural operators, MIONet, PI-DeepONet). FNO often wins on regular-grid benchmarks where its FFT-per-layer cost is decisive; DeepONet wins on geometry-flexible problems and where the branch input is naturally low-dimensional, such as parameterized PDE coefficients or boundary-condition functions sampled sparsely. The Karniadakis group's DeepXDE library codified the architecture and spawned the PI-DeepONet, MIONet, and DeepM&Mnet variants now standard in scientific ML.
The reason to care: DeepONet is the cleanest example of an operator-learning architecture whose approximation guarantees are stated and proved as theorems about operators, not as heuristics about networks. Reading it teaches you what "learning an operator" actually means as a function-space problem.
Mental Model
The branch network produces a vector of coefficients . The trunk network produces a vector of basis evaluations . Their inner product approximates . Read the trunk as a learned basis over the output function's domain; read the branch as a learned encoder that maps the input function to expansion coefficients in that basis.
The basis is learned, not fixed. POD truncates onto the top eigenvectors of an empirical covariance; Fourier methods project onto ; spectral element methods project onto Legendre polynomials. DeepONet jointly learns the basis (via the trunk) and the projection map (via the branch) end-to-end, optimizing both for the operator class observed in training data.
Formal Statement
DeepONet (Branch-Trunk Operator Network)
Let be a compact set of input functions on a compact domain , and let be the target operator producing functions on . Fix sensor locations . A DeepONet is a parametric operator defined by
where:
- the branch network is a neural network (typically MLP or CNN) acting on the sensor-value vector
- the trunk network is a neural network acting on the query coordinate
- is a learned scalar bias
- is the basis size (number of branch outputs equals number of trunk outputs)
The output is a scalar . Vector-valued operators are handled by replicating the architecture across output channels.
The bias matters: without it, vanishes whenever , ruling out affine reconstructions like where is a nonzero constant target offset. Lu et al. (2021) report consistent improvement when is included.
Chen–Chen Universal Approximation
Chen–Chen 1995 Universal Approximation for Nonlinear Operators
Statement
Let be a compact subset of for compact , let be compact, and let be a continuous operator. For any , there exist a positive integer , sensor locations , positive integer , real coefficients , and weights such that
uniformly in and , where is any non-polynomial Tauber–Wiener activation.
Intuition
Three ingredients combine. First, is compact so finite sensor sampling captures functions to arbitrary accuracy (a Stone-Weierstrass-style density argument). Second, classical universal approximation lets a neural network approximate the now-finite-dimensional map from sensor values to expansion coefficients. Third, a separate neural network approximates the target basis functions evaluated at query points. The bilinear pairing reassembles them.
Proof Sketch
Step 1 (sensor discretization). Since is compact in , the evaluation map is uniformly continuous on for sufficiently dense sensors, so is determined up to by its sensor values.
Step 2 (output approximation). The target output function lies in a compact set (continuous image of a compact set). Apply classical universal approximation in the -variable: there exist ridge functions and coefficients depending on such that the linear combination approximates uniformly.
Step 3 (branch approximation). The coefficients in step 2 are continuous functionals of . By step 1, they are continuous functions of the sensor-value vector . Apply classical universal approximation a second time: a single-hidden-layer network in the sensor variables approximates each coefficient.
Step 4 (combine). The bilinear pairing of the branch (sensor values to coefficients) and trunk (query to basis) recovers within uniformly.
Why It Matters
This theorem is the theoretical backbone of every branch-trunk operator network. It says the architecture class is dense in the space of continuous operators between function spaces, given enough sensors and basis size. Without this result, DeepONet would be a heuristic; with it, the architecture is the natural instantiation of a 1995 approximation theorem.
Failure Mode
The theorem is non-quantitative: it does not say how , , or network width scale with target accuracy , nor with the smoothness of or the regularity of the input space . Practical bounds came later (Lanthaler-Mishra-Karniadakis 2022; see next section). The continuity assumption on is also material: discontinuous operators (e.g., shock-forming hyperbolic PDE solution maps at the shock) fall outside the theorem's scope.
Quantitative Error Bounds
The Chen–Chen theorem guarantees existence; it does not give rates. Lanthaler, Mishra, and Karniadakis (Transactions of Mathematics and Its Applications 6, 2022) provide the first explicit bounds. For a Lipschitz operator between Sobolev spaces, the DeepONet approximation error decomposes into three additive pieces:
where:
- is the encoding error from sensor discretization: it depends on the modulus of continuity of on the input function space and decays as at a rate set by the input-space smoothness.
- is the reconstruction error from the finite-rank trunk basis: this is the analog of singular-value truncation error and decays as at a rate set by the singular-value decay of the operator.
- is the branch approximation error: how well the branch network approximates the (now finite-dimensional) coefficient map. This decays at standard neural-network approximation rates.
The decomposition is informative because each term has a separate cure. Encoding error: add sensors. Reconstruction error: increase basis size . Approximation error: widen or deepen the branch network. The bounds are loose in absolute terms but identify the bottleneck for any given operator.
DeepONet vs FNO Trade-offs
The two architectures were designed against the same class of problems and trade differently across three axes.
Computational cost per forward pass. The Fourier Neural Operator costs per layer for an -point grid via FFT-based global convolution. DeepONet costs per query point, but full-field evaluation on an -point grid scales as . For dense full-field outputs at large , FNO is cheaper. For sparse query patterns (e.g., evaluating at scattered sensors, on irregular meshes, or at a few quantities of interest), DeepONet's per-point cost is decisive.
Geometric flexibility. The trunk network accepts arbitrary query coordinates , so DeepONet handles unstructured meshes, irregular domains, and pointwise queries without modification. FNO requires a regular grid because the FFT does. Workarounds (geo-FNO, factorized FNO) exist but lose some of the architectural simplicity. For complex geometries, parameterized domains, or multi-physics coupling at irregular interfaces, DeepONet has the advantage.
Empirical performance on standard benchmarks. Lu et al., A comprehensive and fair comparison of two neural operators (CMAME 2022), runs both architectures across regular-grid PDE benchmarks. FNO wins on translation-invariant problems with periodic-like structure (Burgers, Navier-Stokes, Darcy). DeepONet wins when the branch input is naturally low-dimensional (parametric PDEs with a few coefficients) or when the geometry is irregular. Neither dominates universally.
Resolution invariance. FNO is genuinely resolution-invariant in input and output: train at one grid resolution, evaluate at another. DeepONet is resolution-flexible only in the output (the trunk handles arbitrary ). The branch input is locked to the training sensor configuration; you cannot evaluate on test inputs sampled at different sensor locations.
Worked Example: Antiderivative Operator
A canonical sanity check from Lu et al. (2021) §3.1: learn the antiderivative operator
so . Train inputs are drawn from a Gaussian random field with RBF covariance kernel of length scale 0.2, evaluated at uniformly spaced sensors on . Query points are sampled uniformly on .
Architecture: branch is a 3-layer MLP with ReLU activations and 40 hidden units per layer; trunk is a 3-layer MLP . Basis size . Train for epochs with Adam, learning rate . Lu et al. report relative test error around on held-out input functions, with the dominant residual concentrated near (where the boundary condition creates an integrable cusp).
Two diagnostics worth running on this example. First, plot the learned trunk basis — for the antiderivative operator, the top trunk modes should resemble polynomial or sigmoidal ramps reflecting the integration kernel. Second, sweep the basis size from 5 to 80 and watch the error curve: the elbow gives the effective rank of the operator, which for the smooth antiderivative is small.
Common Confusions
DeepONet is not a neural operator in the strict Kovachki sense
Kovachki, Li, Liu, Azizzadenesheli, Bhattacharya, Stuart, and Anandkumar (JMLR 2023) define a "neural operator" as a parametric map between function spaces with a kernel-integral structure that is intrinsically resolution-invariant in both input and output. FNO satisfies this definition. DeepONet does not: its branch input is a fixed-length vector tied to the training sensor configuration, so input-side resolution invariance fails. DeepONet is better described as a bilinear operator approximator using a learned finite-rank basis. The distinction matters when reading theoretical papers — error bounds for "neural operators" in the Kovachki sense do not automatically apply to DeepONet, and vice versa.
Sensor count and locations are fixed at training time
The branch network expects an -dimensional input vector . The number and the locations are baked into the trained weights. You cannot evaluate a trained DeepONet on a test input sampled at 200 locations if it was trained on 100, nor on a test input sampled at locations that differ from the training sensors. Resolution flexibility is output-only, mediated by the trunk. To support multiple sensor configurations you must either retrain or use architectural extensions (e.g., DeepONet variants with adaptive sensor encoders, or set-based input encoders).
Stacked vs unstacked DeepONet
Lu et al. (2021) introduced two variants. The stacked version trains independent branch networks, one per basis coefficient, producing scalar outputs concatenated into a length- vector. The unstacked version uses a single shared branch network with output channels. Unstacked is faster (one forward pass instead of ) and is the default in DeepXDE; stacked is occasionally more accurate when basis modes are highly heterogeneous. The original Chen–Chen approximation result is stated for the stacked construction; Lanthaler-Mishra-Karniadakis 2022 covers both. When reading benchmarks, check which variant is being reported.
Exercises
Problem
Consider the antiderivative operator with on . Take a single test input so that . Now consider a degenerate DeepONet with basis size , so . Sketch the trunk function that minimizes squared error against on this single test input, and explain why cannot represent the antiderivative operator across a generic input distribution.
Problem
Show that the bilinear part of DeepONet, , is a finite-rank approximation of the operator . When is a compact operator on a Hilbert space, the optimal rank- approximation in operator norm is given by the top singular triples (the spectral / SVD theorem). State the optimal trunk and branch in terms of the SVD of , and argue why the trained DeepONet basis need not coincide with the singular vectors.
References
Canonical:
- Chen, T., and Chen, H., "Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems" (IEEE Transactions on Neural Networks 6, 1995), Sections 2-4. The original universal approximation theorem.
- Lu, L., Jin, P., Pang, G., Zhang, Z., and Karniadakis, G. E., "Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators" (Nature Machine Intelligence 3, 2021), arXiv:1910.03193. The DeepONet paper.
Current:
- Lanthaler, S., Mishra, S., and Karniadakis, G. E., "Error estimates for DeepONets: a deep learning framework in infinite dimensions" (Transactions of Mathematics and Its Applications 6, 2022), Sections 3-5. Quantitative approximation bounds.
- Lu, L., Meng, X., Cai, S., Mao, Z., Goswami, S., Zhang, Z., and Karniadakis, G. E., "A comprehensive and fair comparison of two neural operators (with practical extensions) based on FAIR data" (Computer Methods in Applied Mechanics and Engineering 393, 2022). DeepONet vs FNO benchmark.
- Wang, S., Wang, H., and Perdikaris, P., "Learning the solution operator of parametric partial differential equations with physics-informed DeepONets" (Science Advances 7, 2021). PI-DeepONet variant.
- Jin, P., Meng, S., Lu, L., and Karniadakis, G. E., "MIONet: Learning multiple-input operators via tensor product" (SIAM Journal on Scientific Computing 44, 2022). Multi-input extension.
- Kovachki, N., Lanthaler, S., and Mishra, S., "On universal approximation and error bounds for Fourier neural operators" (Journal of Machine Learning Research 22, 2021). FNO comparison reference.
- Karniadakis, G. E., Kevrekidis, I. G., Lu, L., Perdikaris, P., Wang, S., and Yang, L., "Physics-informed machine learning" (Nature Reviews Physics 3, 2021), Sections 4-5. Survey context placing DeepONet in the broader scientific-ML landscape.
Summary
- DeepONet realizes Chen and Chen's 1995 universal approximation theorem for nonlinear operators via a branch network (sensor encoder) and a trunk network (query basis) joined by an inner product.
- The architecture is a bilinear, finite-rank operator approximator with a learned basis; Lanthaler-Mishra-Karniadakis 2022 decomposes its error into encoding, reconstruction, and approximation pieces.
- Against FNO: DeepONet is mesh-flexible and cheap per query but locked to its training sensor configuration; FNO is grid-bound but resolution-invariant and faster on dense full-field outputs.
Next Topics
- Fourier Neural Operator: the spectral-convolution alternative for resolution-invariant operator learning on regular grids
- Physics-informed neural networks: per-instance PDE solving via residual minimization, the architectural sibling to operator learning
- Navier-Stokes for ML: the canonical PDE benchmark where DeepONet, FNO, and PINNs meet
- Spectral theory of operators: the functional-analytic foundation for finite-rank operator approximation and the Eckart-Young optimality of the SVD
- PDE fundamentals for ML: well-posedness, weak solutions, and the function-space setting that operator learning inhabits
Last reviewed: April 18, 2026
Prerequisites
Foundations this topic depends on.
- Spectral Theory of OperatorsLayer 0B
- Eigenvalues and EigenvectorsLayer 0A
- Matrix Operations and PropertiesLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Navier-Stokes for MLLayer 4
- PDE Fundamentals for Machine LearningLayer 1
- Fast Fourier TransformLayer 1
- Exponential Function PropertiesLayer 0A
- Stochastic Differential EquationsLayer 3
- Brownian MotionLayer 2
- Measure-Theoretic ProbabilityLayer 0B
- Martingale TheoryLayer 0B
- Ito's LemmaLayer 3
- Stochastic Calculus for MLLayer 3
- Functional Analysis CoreLayer 0B
- Metric Spaces, Convergence, and CompletenessLayer 0A
- Inner Product Spaces and OrthogonalityLayer 0A
- Vectors, Matrices, and Linear MapsLayer 0A
- Physics-Informed Neural NetworksLayer 4
- The Jacobian MatrixLayer 0A
- Automatic DifferentiationLayer 1
- Feedforward Networks and BackpropagationLayer 2
- Differentiation in RnLayer 0A
- Matrix CalculusLayer 1
- The Hessian MatrixLayer 0A
- Activation FunctionsLayer 1
- Convex Optimization BasicsLayer 1
- Gradient Descent VariantsLayer 1