Skip to main content

Scientific ML

Classical ODEs: Existence, Stability, and Numerical Methods

The foundational theorems that justify treating an ODE as having a solution at all: Picard-Lindelof existence and uniqueness, Gronwall's inequality, linear-system theory, Lyapunov stability, and the numerical methods (Euler, Runge-Kutta, adaptive step) used in modern scientific ML.

CoreTier 1Stable~40 min
0

Why This Matters

Every modern scientific ML method depends on ODE theory to be well-posed. Neural ODEs replace the layer index of a network with a continuous time variable, but the resulting object is meaningless unless the ODE has a solution and that solution is unique. Physics-informed neural networks approximate solutions to differential equations directly. Diffusion models sample by integrating a probability flow ODE backward in time. In all three cases, the question "does this thing actually exist?" is answered by Picard-Lindelof, and the question "is the integration stable?" is answered by Lyapunov theory and adaptive numerical solvers.

Classical ODE theory predates ML by more than a century. Treat this page as the load-bearing prerequisite for the scientific-ML stack rather than as a topic in its own right.

Setup and Notation

An ordinary differential equation is a relation between an unknown function and its derivatives. The initial value problem (IVP) on an interval I=[t0,t0+T]I = [t_0, t_0 + T] is:

dxdt=f(t,x(t)),x(t0)=x0\frac{dx}{dt} = f(t, x(t)), \quad x(t_0) = x_0

with x:IRdx: I \to \mathbb{R}^d and f:I×RdRdf: I \times \mathbb{R}^d \to \mathbb{R}^d a vector field.

A solution is a continuously differentiable function x(t)x(t) that satisfies the equation everywhere on II. The IVP and the integral equation x(t)=x0+t0tf(s,x(s))dsx(t) = x_0 + \int_{t_0}^{t} f(s, x(s))\, ds are equivalent for continuous ff, and the integral form is the more useful object for proofs.

Existence and Uniqueness

Theorem

Picard-Lindelof Existence and Uniqueness

Statement

The IVP x˙=f(t,x),x(t0)=x0\dot{x} = f(t, x), x(t_0) = x_0 has a unique solution x:[t0,t0+α]Rdx: [t_0, t_0 + \alpha] \to \mathbb{R}^d where α=min(a,b/M)\alpha = \min(a, b/M).

Intuition

Lipschitz continuity in xx controls how fast trajectories can separate. If two solutions started at the same point could diverge, then locally near that point ff would have to behave worse than linearly, contradicting the Lipschitz bound. The condition α=min(a,b/M)\alpha = \min(a, b/M) guarantees the trajectory cannot leave the rectangle before time α\alpha elapses.

Proof Sketch

Define the Picard iteration on the Banach space C([t0,t0+α],Rd)C([t_0, t_0+\alpha], \mathbb{R}^d) with the sup norm:

T[ϕ](t)=x0+t0tf(s,ϕ(s))ds.T[\phi](t) = x_0 + \int_{t_0}^{t} f(s, \phi(s))\, ds.

Fixed points of TT are solutions of the IVP. Verify two facts:

  1. TT maps the closed ball B(x0,b)\overline{B}(x_0, b) into itself, because T[ϕ](t)x0Mαb\|T[\phi](t) - x_0\| \leq M \alpha \leq b.
  2. TT is a contraction on [t0,t0+α][t_0, t_0 + \alpha'] for α<1/L\alpha' < 1/L: T[ϕ]T[ψ]Lαϕψ\|T[\phi] - T[\psi]\|_\infty \leq L \alpha' \|\phi - \psi\|_\infty.

By Banach's fixed-point theorem, TT has a unique fixed point on the smaller interval. Standard continuation extends to [t0,t0+α][t_0, t_0 + \alpha].

Why It Matters

Picard-Lindelof is the gateway theorem. Without it, "the solution to this ODE" is an undefined phrase. Every scientific ML method that integrates a learned vector field fθf_\theta implicitly assumes fθf_\theta is Lipschitz, otherwise the trajectory it computes corresponds to nothing well-defined. Most neural networks with smooth activations (tanh, softplus, GELU) satisfy this; ReLU is Lipschitz but not differentiable at the origin, which is why ReLU-based neural ODEs work in practice but require care.

Failure Mode

Lipschitz continuity is sufficient but not necessary. The classical counterexample is x˙=x,x(0)=0\dot{x} = \sqrt{|x|}, x(0) = 0, where ff is continuous but only Holder-1/2 continuous in xx near the origin. Two distinct solutions exist: x0x \equiv 0 and x(t)=t2/4x(t) = t^2/4. Uniqueness fails precisely because the Lipschitz constant blows up at x=0x = 0. Peano's theorem still gives existence under continuity alone, but uniqueness requires Lipschitz or similar regularity (Osgood's condition is the sharp generalization).

Sensitivity to Initial Conditions: Gronwall

The next question is quantitative: if you perturb the initial condition or the vector field, how far apart do the resulting trajectories drift? The answer is exponential in the Lipschitz constant.

Lemma

Gronwall's Inequality (Differential Form)

Statement

If u(t)α(t)+t0tβ(s)u(s)dsu(t) \leq \alpha(t) + \int_{t_0}^{t} \beta(s) u(s)\, ds for all t[t0,T]t \in [t_0, T], then:

u(t)α(t)+t0tα(s)β(s)exp ⁣(stβ(r)dr)ds.u(t) \leq \alpha(t) + \int_{t_0}^{t} \alpha(s) \beta(s) \exp\!\left(\int_s^t \beta(r)\, dr\right) ds.

When α(t)α0\alpha(t) \equiv \alpha_0 is constant, this simplifies to u(t)α0exp ⁣(t0tβ(s)ds)u(t) \leq \alpha_0 \exp\!\left(\int_{t_0}^{t} \beta(s)\, ds\right).

Intuition

Gronwall converts an integral inequality into a closed-form pointwise bound. Apply it to the difference of two solutions of x˙=f(t,x)\dot{x} = f(t,x) with Lipschitz constant LL: x1(t)x2(t)x1(t0)x2(t0)eL(tt0)\|x_1(t) - x_2(t)\| \leq \|x_1(t_0) - x_2(t_0)\| e^{L(t - t_0)}. Two trajectories starting close stay close, but only with an exponential-in-time bound. Over a long horizon with large LL, "close" becomes meaningless quickly.

Why It Matters

This bound governs every numerical-stability and adversarial-perturbation argument involving ODE flows. For neural ODEs, LL is roughly the operator norm of the Jacobian fθ/h\partial f_\theta/\partial h. If this is large, gradient computations using the adjoint method accumulate error exponentially in LTL \cdot T. Lipschitz regularization of fθf_\theta keeps LL small and keeps the system numerically tractable.

Linear Systems

The linear case is the only family of ODEs with a complete closed-form theory, and it provides the local picture for nonlinear systems via linearization.

Definition

Linear Time-Invariant ODE

A linear time-invariant (LTI) system is x˙=Ax\dot{x} = Ax with ARd×dA \in \mathbb{R}^{d \times d} a constant matrix and x0Rdx_0 \in \mathbb{R}^d. The unique solution is:

x(t)=eAtx0,x(t) = e^{At} x_0,

where eAt=k=0(At)kk!e^{At} = \sum_{k=0}^{\infty} \frac{(At)^k}{k!} is the matrix exponential.

The matrix exponential is the central object. Its behavior is determined by the eigenstructure of AA: if A=PΛP1A = P \Lambda P^{-1} with Λ\Lambda diagonal, then eAt=Pdiag(eλ1t,,eλdt)P1e^{At} = P\, \mathrm{diag}(e^{\lambda_1 t}, \ldots, e^{\lambda_d t})\, P^{-1}. Eigenvalues with positive real part produce exponential growth; negative real part produces decay; pure-imaginary eigenvalues produce oscillation. For non-diagonalizable AA, use the Jordan form, which adds polynomial-in-tt factors.

For inhomogeneous systems x˙=Ax+g(t)\dot{x} = Ax + g(t), the variation of parameters formula gives:

x(t)=eA(tt0)x0+t0teA(ts)g(s)ds.x(t) = e^{A(t - t_0)} x_0 + \int_{t_0}^{t} e^{A(t - s)} g(s)\, ds.

This is the same template that appears in the Ornstein-Uhlenbeck SDE solution (see stochastic calculus) — replace g(s)dsg(s)\, ds with σdWs\sigma\, dW_s and you have the SDE analog.

Lyapunov Stability

For nonlinear systems where closed-form solutions are unavailable, stability is decided by an energy-like function rather than by the solution itself.

Definition

Stability of an Equilibrium

A point xRdx^* \in \mathbb{R}^d with f(x)=0f(x^*) = 0 is an equilibrium. It is:

  • Lyapunov stable if for every ϵ>0\epsilon > 0 there exists δ>0\delta > 0 such that x0x<δ\|x_0 - x^*\| < \delta implies x(t)x<ϵ\|x(t) - x^*\| < \epsilon for all t0t \geq 0.
  • Asymptotically stable if it is Lyapunov stable and there exists δ>0\delta > 0 such that x0x<δ\|x_0 - x^*\| < \delta implies x(t)xx(t) \to x^* as tt \to \infty.
Theorem

Lyapunov's Direct Method

Statement

Let V˙(x)=V(x)f(x)\dot{V}(x) = \nabla V(x)^\top f(x) be the time derivative of VV along trajectories.

  1. If V˙(x)0\dot{V}(x) \leq 0 on DD, then x=0x^* = 0 is Lyapunov stable.
  2. If V˙(x)<0\dot{V}(x) < 0 on D{0}D \setminus \{0\}, then x=0x^* = 0 is asymptotically stable.

Intuition

Treat VV as a generalized energy. If trajectories cannot increase VV, they cannot escape the level sets {Vc}\{V \leq c\}, so they remain near the equilibrium. If VV strictly decreases, every trajectory eventually reaches the bottom, which is the equilibrium.

Proof Sketch

For Lyapunov stability: given ϵ>0\epsilon > 0, choose cc small enough that {Vc}DB(0,ϵ)\{V \leq c\} \cap D \subseteq B(0, \epsilon). Pick δ\delta such that V(x)<cV(x) < c for x<δ\|x\| < \delta. Trajectories starting in B(0,δ)B(0, \delta) keep VcV \leq c and therefore stay in B(0,ϵ)B(0, \epsilon).

For asymptotic stability: V(x(t))V(x(t)) is non-increasing and bounded below, so it converges to some V0V_\infty \geq 0. If V>0V_\infty > 0, then x(t)x(t) stays in a compact set where V˙<0\dot{V} < 0 uniformly, contradicting convergence of VV. Hence V=0V_\infty = 0 and x(t)0x(t) \to 0.

Why It Matters

Lyapunov is how stability is proven without solving the ODE. In control theory and reinforcement learning, the same machinery underlies stability of feedback policies. In ML training, treating V(θ)=L(θ)LV(\theta) = \mathcal{L}(\theta) - \mathcal{L}^* and analyzing V˙\dot{V} under gradient flow is the standard convergence-proof template (see stability and optimization dynamics).

Failure Mode

Constructing VV is an art. There is no algorithm for finding a Lyapunov function for a generic nonlinear system; the negative result of Lyapunov's method is uninformative because it might just mean you picked the wrong VV. For LTI systems x˙=Ax\dot{x} = Ax, V(x)=xPxV(x) = x^\top P x with PP solving the Lyapunov equation AP+PA=QA^\top P + PA = -Q (any Q0Q \succ 0) works iff all eigenvalues of AA have negative real part.

Numerical Methods

Closed-form solutions are unavailable for almost all nonlinear ODEs, so simulation requires discretization. A numerical method computes an approximation x~nx(tn)\tilde{x}_n \approx x(t_n) on a grid t0<t1<<tNt_0 < t_1 < \cdots < t_N.

Euler's Method (1st Order)

The simplest scheme is the forward Euler update:

x~n+1=x~n+hf(tn,x~n),\tilde{x}_{n+1} = \tilde{x}_n + h \, f(t_n, \tilde{x}_n),

with step size h=tn+1tnh = t_{n+1} - t_n. This is the first-order Taylor expansion of x(tn+1)x(t_{n+1}) around tnt_n. The local truncation error per step is O(h2)O(h^2) and the global error after integrating to a fixed time TT is O(h)O(h). Forward Euler is what a ResNet update computes; it is also the default when reasoning about gradient flow in ML.

Forward Euler is conditionally stable: for the linear test problem x˙=λx\dot{x} = \lambda x with λ<0\lambda < 0, it requires h<2/λh < 2/|\lambda| to avoid blow-up. Backward Euler (x~n+1=x~n+hf(tn+1,x~n+1)\tilde{x}_{n+1} = \tilde{x}_n + h f(t_{n+1}, \tilde{x}_{n+1}), an implicit scheme) is unconditionally stable for stiff linear systems but requires solving a nonlinear equation each step.

Runge-Kutta (4th Order)

The classical RK4 method uses four function evaluations per step:

k_1 &= f(t_n, \tilde{x}_n), \\ k_2 &= f(t_n + h/2, \tilde{x}_n + h k_1/2), \\ k_3 &= f(t_n + h/2, \tilde{x}_n + h k_2/2), \\ k_4 &= f(t_n + h, \tilde{x}_n + h k_3), \\ \tilde{x}_{n+1} &= \tilde{x}_n + \frac{h}{6}(k_1 + 2k_2 + 2k_3 + k_4). \end{aligned}$$ Local truncation error is $O(h^5)$, global error is $O(h^4)$. Four evaluations per step is the price for the higher order; for smooth right-hand sides, the per-accuracy compute beats Euler badly. ### Adaptive Step Size: Dormand-Prince Fixed step sizes either waste compute on smooth regions or under-resolve sharp regions. **Adaptive solvers** compute two estimates of $\tilde{x}_{n+1}$ at different orders, use the difference as a local-error estimate, and choose the next step size to keep error within tolerance. The standard implementation in `scipy.integrate.solve_ivp` and `torchdiffeq` is **Dormand-Prince RK45** (Dormand and Prince 1980): a 7-stage scheme that produces both a 5th-order and an embedded 4th-order estimate. Step rejection happens when the difference exceeds tolerance; otherwise the step is accepted and the next $h$ is chosen as $h_{\text{new}} = h \cdot (\mathrm{tol} / \mathrm{err})^{1/5}$ (with safety factors). This is what neural ODE solvers use by default. For **stiff** problems (eigenvalue ratio $> 10^3$), explicit methods like RK45 force tiny step sizes. Implicit solvers (BDF methods, Rosenbrock methods) are necessary; they pay per-step cost for solving a nonlinear system in exchange for stability. ## Common Confusions <Confusion title="Lipschitz is not the same as differentiable"> A function can be Lipschitz without being differentiable (ReLU, $|x|$). It can be differentiable without being Lipschitz ($x^2$ on $\mathbb{R}$ is differentiable but the Lipschitz constant is unbounded). Picard-Lindelof requires Lipschitz, not differentiability. In practice, $C^1$ functions on a compact set are automatically Lipschitz on that set with constant $\sup \|\nabla f\|$, which is why "$f$ is smooth on a bounded region" is the usual sufficient condition cited in ML papers. </Confusion> <Confusion title="Existence and uniqueness are local guarantees"> Picard-Lindelof gives a solution on $[t_0, t_0 + \alpha]$ for some possibly small $\alpha$. The solution may **blow up in finite time** even with smooth $f$. Example: $\dot{x} = x^2, x(0) = 1$ has solution $x(t) = 1/(1 - t)$, which diverges at $t = 1$. Global existence requires extra assumptions: linear growth of $f$, dissipativity, or a Lyapunov function bounding solutions away from infinity. </Confusion> <Confusion title="Numerical convergence is not the same as physical correctness"> A numerical solver converges to the true ODE solution as $h \to 0$. It does not check whether the ODE itself models reality correctly. For chaotic systems (Lorenz, double pendulum), individual trajectories are exponentially sensitive to perturbations, so any finite-precision solver eventually diverges from the "true" trajectory regardless of order. Statistical properties of the trajectory (invariant measure, Lyapunov exponents) may still be computable, but the trajectory itself becomes meaningless past a horizon set by the largest Lyapunov exponent. </Confusion> ## Exercises <Exercise difficulty="core"> <Problem> Consider the IVP $\dot{x} = -x^3, x(0) = 1$ on $t \in [0, 10]$. 1. Verify that $f(x) = -x^3$ is Lipschitz on the bounded interval $[-2, 2]$, and compute the Lipschitz constant. 2. Write down the Picard iteration starting from $\phi_0(t) \equiv 1$ and compute $\phi_1(t)$ explicitly. 3. Show that $V(x) = x^2/2$ is a Lyapunov function for the equilibrium $x^* = 0$. </Problem> <Hint> For (1): $|f'(x)| = 3x^2$, take the sup on the interval. For (2): $\phi_1(t) = 1 + \int_0^t (-1)^3\, ds$. For (3): compute $\dot{V}$ along trajectories using the chain rule. </Hint> <Solution> 1. $f'(x) = -3x^2$. On $[-2, 2]$, $\sup |f'(x)| = 12$, so $L = 12$. 2. $\phi_1(t) = 1 + \int_0^t -1\, ds = 1 - t$. (The next iterate would integrate $-(1-s)^3$, giving a more accurate approximation.) 3. $\dot{V}(x) = V'(x) \dot{x} = x \cdot (-x^3) = -x^4$. This is $\leq 0$ on all of $\mathbb{R}$ and strictly negative for $x \neq 0$. Hence $x^* = 0$ is asymptotically stable. (Note that the convergence rate is *not* exponential because $\dot{V} = -x^4 = -(2V)^2$, giving polynomial decay.) </Solution> </Exercise> <Exercise difficulty="advanced"> <Problem> Consider the linear system $\dot{x} = Ax$ with $A = \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix}$. 1. Compute the eigenvalues of $A$ and conclude what kind of stability the origin has. 2. Find a quadratic Lyapunov function $V(x) = x^\top P x$ certifying Lyapunov stability (but not asymptotic stability) of the origin. 3. Apply forward Euler with step size $h$ to this system. Show that $\|\tilde{x}_n\|^2 = (1 + h^2)^n \|\tilde{x}_0\|^2$, so Euler grows the energy and is unstable for any $h > 0$. </Problem> <Hint> (1) Eigenvalues are $\pm i$. (2) Try $V(x) = \frac{1}{2}(x_1^2 + x_2^2)$, the rotational energy. (3) Compute $\|\tilde{x}_{n+1}\|^2$ directly from the Euler update. </Hint> <Solution> 1. The characteristic polynomial is $\lambda^2 + 1 = 0$, so $\lambda = \pm i$. Both eigenvalues are purely imaginary, so the origin is **Lyapunov stable but not asymptotically stable** (trajectories are circles around the origin). 2. With $P = I/2$: $\dot{V} = x^\top P A x + x^\top A^\top P x = \frac{1}{2} x^\top (A + A^\top) x = 0$ since $A$ is skew-symmetric. So $V$ is conserved along trajectories, certifying Lyapunov stability. 3. The Euler step is $\tilde{x}_{n+1} = (I + hA) \tilde{x}_n$. Compute $(I + hA)^\top (I + hA) = I + h(A + A^\top) + h^2 A^\top A = I + h^2 I = (1 + h^2) I$ (using $A + A^\top = 0$ and $A^\top A = I$). Hence $\|\tilde{x}_{n+1}\|^2 = (1 + h^2) \|\tilde{x}_n\|^2$, and iterating gives the claim. The continuous system conserves energy; Euler injects energy at rate $\log(1 + h^2) \approx h^2$ per step. **Symplectic integrators** (leapfrog, Stormer-Verlet) are the fix when energy conservation matters. </Solution> </Exercise> ## References **Foundational:** - Hirsch, Smale, Devaney, *Differential Equations, Dynamical Systems, and an Introduction to Chaos* (3rd ed., 2013), Chapters 1-7. The standard graduate-level introduction. - Coddington and Levinson, *Theory of Ordinary Differential Equations* (1955), Chapters 1-2. Classical reference for Picard-Lindelof and continuation theorems. - Hartman, *Ordinary Differential Equations* (2nd ed., 1982), Chapter II. Definitive treatment of existence and uniqueness, including Osgood's condition. **Stability:** - Khalil, *Nonlinear Systems* (3rd ed., 2002), Chapters 4 and 14. The reference for Lyapunov analysis and applications in control. - Vidyasagar, *Nonlinear Systems Analysis* (2nd ed., 1993), Chapter 5. Complementary treatment with more invariance-principle material. **Numerical methods:** - Hairer, Norsett, Wanner, *Solving Ordinary Differential Equations I: Nonstiff Problems* (2nd ed., 1993). Standard reference for Runge-Kutta methods and adaptive control. - Hairer and Wanner, *Solving Ordinary Differential Equations II: Stiff and Differential-Algebraic Problems* (2nd ed., 1996). Stiff solvers, BDF methods. - Suli and Mayers, *An Introduction to Numerical Analysis* (2003), Chapter 12. Compact undergraduate-level treatment. - Dormand and Prince, "A family of embedded Runge-Kutta formulae," *Journal of Computational and Applied Mathematics* 6(1):19-26 (1980). The original RK45 paper. ## Next Topics - **[Physics-informed neural networks](/topics/physics-informed-neural-networks)**: parameterize the solution $u_\theta$ and enforce the PDE through the loss - **[Neural ODEs](/topics/neural-odes)**: parameterize the dynamics $f_\theta$ and let an ODE solver compute the trajectory - **[Stochastic calculus for ML](/topics/stochastic-calculus-for-ml)**: extend ODE theory to processes driven by Brownian motion (the SDE generalization)

Last reviewed: April 17, 2026

Prerequisites

Foundations this topic depends on.

Builds on This

Next Topics