Scientific ML
Classical ODEs: Existence, Stability, and Numerical Methods
The foundational theorems that justify treating an ODE as having a solution at all: Picard-Lindelof existence and uniqueness, Gronwall's inequality, linear-system theory, Lyapunov stability, and the numerical methods (Euler, Runge-Kutta, adaptive step) used in modern scientific ML.
Prerequisites
Why This Matters
Every modern scientific ML method depends on ODE theory to be well-posed. Neural ODEs replace the layer index of a network with a continuous time variable, but the resulting object is meaningless unless the ODE has a solution and that solution is unique. Physics-informed neural networks approximate solutions to differential equations directly. Diffusion models sample by integrating a probability flow ODE backward in time. In all three cases, the question "does this thing actually exist?" is answered by Picard-Lindelof, and the question "is the integration stable?" is answered by Lyapunov theory and adaptive numerical solvers.
Classical ODE theory predates ML by more than a century. Treat this page as the load-bearing prerequisite for the scientific-ML stack rather than as a topic in its own right.
Setup and Notation
An ordinary differential equation is a relation between an unknown function and its derivatives. The initial value problem (IVP) on an interval is:
with and a vector field.
A solution is a continuously differentiable function that satisfies the equation everywhere on . The IVP and the integral equation are equivalent for continuous , and the integral form is the more useful object for proofs.
Existence and Uniqueness
Picard-Lindelof Existence and Uniqueness
Statement
The IVP has a unique solution where .
Intuition
Lipschitz continuity in controls how fast trajectories can separate. If two solutions started at the same point could diverge, then locally near that point would have to behave worse than linearly, contradicting the Lipschitz bound. The condition guarantees the trajectory cannot leave the rectangle before time elapses.
Proof Sketch
Define the Picard iteration on the Banach space with the sup norm:
Fixed points of are solutions of the IVP. Verify two facts:
- maps the closed ball into itself, because .
- is a contraction on for : .
By Banach's fixed-point theorem, has a unique fixed point on the smaller interval. Standard continuation extends to .
Why It Matters
Picard-Lindelof is the gateway theorem. Without it, "the solution to this ODE" is an undefined phrase. Every scientific ML method that integrates a learned vector field implicitly assumes is Lipschitz, otherwise the trajectory it computes corresponds to nothing well-defined. Most neural networks with smooth activations (tanh, softplus, GELU) satisfy this; ReLU is Lipschitz but not differentiable at the origin, which is why ReLU-based neural ODEs work in practice but require care.
Failure Mode
Lipschitz continuity is sufficient but not necessary. The classical counterexample is , where is continuous but only Holder-1/2 continuous in near the origin. Two distinct solutions exist: and . Uniqueness fails precisely because the Lipschitz constant blows up at . Peano's theorem still gives existence under continuity alone, but uniqueness requires Lipschitz or similar regularity (Osgood's condition is the sharp generalization).
Sensitivity to Initial Conditions: Gronwall
The next question is quantitative: if you perturb the initial condition or the vector field, how far apart do the resulting trajectories drift? The answer is exponential in the Lipschitz constant.
Gronwall's Inequality (Differential Form)
Statement
If for all , then:
When is constant, this simplifies to .
Intuition
Gronwall converts an integral inequality into a closed-form pointwise bound. Apply it to the difference of two solutions of with Lipschitz constant : . Two trajectories starting close stay close, but only with an exponential-in-time bound. Over a long horizon with large , "close" becomes meaningless quickly.
Why It Matters
This bound governs every numerical-stability and adversarial-perturbation argument involving ODE flows. For neural ODEs, is roughly the operator norm of the Jacobian . If this is large, gradient computations using the adjoint method accumulate error exponentially in . Lipschitz regularization of keeps small and keeps the system numerically tractable.
Linear Systems
The linear case is the only family of ODEs with a complete closed-form theory, and it provides the local picture for nonlinear systems via linearization.
Linear Time-Invariant ODE
A linear time-invariant (LTI) system is with a constant matrix and . The unique solution is:
where is the matrix exponential.
The matrix exponential is the central object. Its behavior is determined by the eigenstructure of : if with diagonal, then . Eigenvalues with positive real part produce exponential growth; negative real part produces decay; pure-imaginary eigenvalues produce oscillation. For non-diagonalizable , use the Jordan form, which adds polynomial-in- factors.
For inhomogeneous systems , the variation of parameters formula gives:
This is the same template that appears in the Ornstein-Uhlenbeck SDE solution (see stochastic calculus) — replace with and you have the SDE analog.
Lyapunov Stability
For nonlinear systems where closed-form solutions are unavailable, stability is decided by an energy-like function rather than by the solution itself.
Stability of an Equilibrium
A point with is an equilibrium. It is:
- Lyapunov stable if for every there exists such that implies for all .
- Asymptotically stable if it is Lyapunov stable and there exists such that implies as .
Lyapunov's Direct Method
Statement
Let be the time derivative of along trajectories.
- If on , then is Lyapunov stable.
- If on , then is asymptotically stable.
Intuition
Treat as a generalized energy. If trajectories cannot increase , they cannot escape the level sets , so they remain near the equilibrium. If strictly decreases, every trajectory eventually reaches the bottom, which is the equilibrium.
Proof Sketch
For Lyapunov stability: given , choose small enough that . Pick such that for . Trajectories starting in keep and therefore stay in .
For asymptotic stability: is non-increasing and bounded below, so it converges to some . If , then stays in a compact set where uniformly, contradicting convergence of . Hence and .
Why It Matters
Lyapunov is how stability is proven without solving the ODE. In control theory and reinforcement learning, the same machinery underlies stability of feedback policies. In ML training, treating and analyzing under gradient flow is the standard convergence-proof template (see stability and optimization dynamics).
Failure Mode
Constructing is an art. There is no algorithm for finding a Lyapunov function for a generic nonlinear system; the negative result of Lyapunov's method is uninformative because it might just mean you picked the wrong . For LTI systems , with solving the Lyapunov equation (any ) works iff all eigenvalues of have negative real part.
Numerical Methods
Closed-form solutions are unavailable for almost all nonlinear ODEs, so simulation requires discretization. A numerical method computes an approximation on a grid .
Euler's Method (1st Order)
The simplest scheme is the forward Euler update:
with step size . This is the first-order Taylor expansion of around . The local truncation error per step is and the global error after integrating to a fixed time is . Forward Euler is what a ResNet update computes; it is also the default when reasoning about gradient flow in ML.
Forward Euler is conditionally stable: for the linear test problem with , it requires to avoid blow-up. Backward Euler (, an implicit scheme) is unconditionally stable for stiff linear systems but requires solving a nonlinear equation each step.
Runge-Kutta (4th Order)
The classical RK4 method uses four function evaluations per step:
k_1 &= f(t_n, \tilde{x}_n), \\ k_2 &= f(t_n + h/2, \tilde{x}_n + h k_1/2), \\ k_3 &= f(t_n + h/2, \tilde{x}_n + h k_2/2), \\ k_4 &= f(t_n + h, \tilde{x}_n + h k_3), \\ \tilde{x}_{n+1} &= \tilde{x}_n + \frac{h}{6}(k_1 + 2k_2 + 2k_3 + k_4). \end{aligned}$$ Local truncation error is $O(h^5)$, global error is $O(h^4)$. Four evaluations per step is the price for the higher order; for smooth right-hand sides, the per-accuracy compute beats Euler badly. ### Adaptive Step Size: Dormand-Prince Fixed step sizes either waste compute on smooth regions or under-resolve sharp regions. **Adaptive solvers** compute two estimates of $\tilde{x}_{n+1}$ at different orders, use the difference as a local-error estimate, and choose the next step size to keep error within tolerance. The standard implementation in `scipy.integrate.solve_ivp` and `torchdiffeq` is **Dormand-Prince RK45** (Dormand and Prince 1980): a 7-stage scheme that produces both a 5th-order and an embedded 4th-order estimate. Step rejection happens when the difference exceeds tolerance; otherwise the step is accepted and the next $h$ is chosen as $h_{\text{new}} = h \cdot (\mathrm{tol} / \mathrm{err})^{1/5}$ (with safety factors). This is what neural ODE solvers use by default. For **stiff** problems (eigenvalue ratio $> 10^3$), explicit methods like RK45 force tiny step sizes. Implicit solvers (BDF methods, Rosenbrock methods) are necessary; they pay per-step cost for solving a nonlinear system in exchange for stability. ## Common Confusions <Confusion title="Lipschitz is not the same as differentiable"> A function can be Lipschitz without being differentiable (ReLU, $|x|$). It can be differentiable without being Lipschitz ($x^2$ on $\mathbb{R}$ is differentiable but the Lipschitz constant is unbounded). Picard-Lindelof requires Lipschitz, not differentiability. In practice, $C^1$ functions on a compact set are automatically Lipschitz on that set with constant $\sup \|\nabla f\|$, which is why "$f$ is smooth on a bounded region" is the usual sufficient condition cited in ML papers. </Confusion> <Confusion title="Existence and uniqueness are local guarantees"> Picard-Lindelof gives a solution on $[t_0, t_0 + \alpha]$ for some possibly small $\alpha$. The solution may **blow up in finite time** even with smooth $f$. Example: $\dot{x} = x^2, x(0) = 1$ has solution $x(t) = 1/(1 - t)$, which diverges at $t = 1$. Global existence requires extra assumptions: linear growth of $f$, dissipativity, or a Lyapunov function bounding solutions away from infinity. </Confusion> <Confusion title="Numerical convergence is not the same as physical correctness"> A numerical solver converges to the true ODE solution as $h \to 0$. It does not check whether the ODE itself models reality correctly. For chaotic systems (Lorenz, double pendulum), individual trajectories are exponentially sensitive to perturbations, so any finite-precision solver eventually diverges from the "true" trajectory regardless of order. Statistical properties of the trajectory (invariant measure, Lyapunov exponents) may still be computable, but the trajectory itself becomes meaningless past a horizon set by the largest Lyapunov exponent. </Confusion> ## Exercises <Exercise difficulty="core"> <Problem> Consider the IVP $\dot{x} = -x^3, x(0) = 1$ on $t \in [0, 10]$. 1. Verify that $f(x) = -x^3$ is Lipschitz on the bounded interval $[-2, 2]$, and compute the Lipschitz constant. 2. Write down the Picard iteration starting from $\phi_0(t) \equiv 1$ and compute $\phi_1(t)$ explicitly. 3. Show that $V(x) = x^2/2$ is a Lyapunov function for the equilibrium $x^* = 0$. </Problem> <Hint> For (1): $|f'(x)| = 3x^2$, take the sup on the interval. For (2): $\phi_1(t) = 1 + \int_0^t (-1)^3\, ds$. For (3): compute $\dot{V}$ along trajectories using the chain rule. </Hint> <Solution> 1. $f'(x) = -3x^2$. On $[-2, 2]$, $\sup |f'(x)| = 12$, so $L = 12$. 2. $\phi_1(t) = 1 + \int_0^t -1\, ds = 1 - t$. (The next iterate would integrate $-(1-s)^3$, giving a more accurate approximation.) 3. $\dot{V}(x) = V'(x) \dot{x} = x \cdot (-x^3) = -x^4$. This is $\leq 0$ on all of $\mathbb{R}$ and strictly negative for $x \neq 0$. Hence $x^* = 0$ is asymptotically stable. (Note that the convergence rate is *not* exponential because $\dot{V} = -x^4 = -(2V)^2$, giving polynomial decay.) </Solution> </Exercise> <Exercise difficulty="advanced"> <Problem> Consider the linear system $\dot{x} = Ax$ with $A = \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix}$. 1. Compute the eigenvalues of $A$ and conclude what kind of stability the origin has. 2. Find a quadratic Lyapunov function $V(x) = x^\top P x$ certifying Lyapunov stability (but not asymptotic stability) of the origin. 3. Apply forward Euler with step size $h$ to this system. Show that $\|\tilde{x}_n\|^2 = (1 + h^2)^n \|\tilde{x}_0\|^2$, so Euler grows the energy and is unstable for any $h > 0$. </Problem> <Hint> (1) Eigenvalues are $\pm i$. (2) Try $V(x) = \frac{1}{2}(x_1^2 + x_2^2)$, the rotational energy. (3) Compute $\|\tilde{x}_{n+1}\|^2$ directly from the Euler update. </Hint> <Solution> 1. The characteristic polynomial is $\lambda^2 + 1 = 0$, so $\lambda = \pm i$. Both eigenvalues are purely imaginary, so the origin is **Lyapunov stable but not asymptotically stable** (trajectories are circles around the origin). 2. With $P = I/2$: $\dot{V} = x^\top P A x + x^\top A^\top P x = \frac{1}{2} x^\top (A + A^\top) x = 0$ since $A$ is skew-symmetric. So $V$ is conserved along trajectories, certifying Lyapunov stability. 3. The Euler step is $\tilde{x}_{n+1} = (I + hA) \tilde{x}_n$. Compute $(I + hA)^\top (I + hA) = I + h(A + A^\top) + h^2 A^\top A = I + h^2 I = (1 + h^2) I$ (using $A + A^\top = 0$ and $A^\top A = I$). Hence $\|\tilde{x}_{n+1}\|^2 = (1 + h^2) \|\tilde{x}_n\|^2$, and iterating gives the claim. The continuous system conserves energy; Euler injects energy at rate $\log(1 + h^2) \approx h^2$ per step. **Symplectic integrators** (leapfrog, Stormer-Verlet) are the fix when energy conservation matters. </Solution> </Exercise> ## References **Foundational:** - Hirsch, Smale, Devaney, *Differential Equations, Dynamical Systems, and an Introduction to Chaos* (3rd ed., 2013), Chapters 1-7. The standard graduate-level introduction. - Coddington and Levinson, *Theory of Ordinary Differential Equations* (1955), Chapters 1-2. Classical reference for Picard-Lindelof and continuation theorems. - Hartman, *Ordinary Differential Equations* (2nd ed., 1982), Chapter II. Definitive treatment of existence and uniqueness, including Osgood's condition. **Stability:** - Khalil, *Nonlinear Systems* (3rd ed., 2002), Chapters 4 and 14. The reference for Lyapunov analysis and applications in control. - Vidyasagar, *Nonlinear Systems Analysis* (2nd ed., 1993), Chapter 5. Complementary treatment with more invariance-principle material. **Numerical methods:** - Hairer, Norsett, Wanner, *Solving Ordinary Differential Equations I: Nonstiff Problems* (2nd ed., 1993). Standard reference for Runge-Kutta methods and adaptive control. - Hairer and Wanner, *Solving Ordinary Differential Equations II: Stiff and Differential-Algebraic Problems* (2nd ed., 1996). Stiff solvers, BDF methods. - Suli and Mayers, *An Introduction to Numerical Analysis* (2003), Chapter 12. Compact undergraduate-level treatment. - Dormand and Prince, "A family of embedded Runge-Kutta formulae," *Journal of Computational and Applied Mathematics* 6(1):19-26 (1980). The original RK45 paper. ## Next Topics - **[Physics-informed neural networks](/topics/physics-informed-neural-networks)**: parameterize the solution $u_\theta$ and enforce the PDE through the loss - **[Neural ODEs](/topics/neural-odes)**: parameterize the dynamics $f_\theta$ and let an ODE solver compute the trajectory - **[Stochastic calculus for ML](/topics/stochastic-calculus-for-ml)**: extend ODE theory to processes driven by Brownian motion (the SDE generalization)Last reviewed: April 17, 2026
Prerequisites
Foundations this topic depends on.
- Continuity in R^nLayer 0A
- Metric Spaces, Convergence, and CompletenessLayer 0A
- The Jacobian MatrixLayer 0A