Classical ODEs: Existence, Stability, and Numerical Methods

Sneiderman, Robby

Scientific ML

Classical ODEs: Existence, Stability, and Numerical Methods

The foundational theorems that justify treating an ODE as having a solution at all: Picard-Lindelof existence and uniqueness, Gronwall's inequality, linear-system theory, Lyapunov stability, and the numerical methods (Euler, Runge-Kutta, adaptive step) used in modern scientific ML.

CoreTier 1StableCore spine~40 min

Prerequisites

Continuity in Rn The Jacobian Matrix

Start 8-question practice · 5 available 3-question pulse check Prereq Map

Learning position

Read this page in the graph.

scientific-ml | layer 1 | tier 1. This page has 2 direct prerequisites and 5 published dependents.

Open Atlas Prerequisites Leads to

What next

Physics-Informed Neural Networks

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Every modern scientific ML method depends on ODE theory to be well-posed. Neural ODEs replace the layer index of a network with a continuous time variable, but the resulting object is meaningless unless the ODE has a solution and that solution is unique. Physics-informed neural networks approximate solutions to differential equations directly. Diffusion models sample by integrating a probability flow ODE backward in time. In all three cases, the question "does this thing actually exist?" is answered by Picard-Lindelof, and the question "is the integration stable?" is answered by Lyapunov theory and adaptive numerical solvers.

Classical ODE theory predates ML by more than a century. Treat this page as the load-bearing prerequisite for the scientific-ML stack rather than as a topic in its own right.

Setup and Notation

An ordinary differential equation is a relation between an unknown function and its derivatives. The initial value problem (IVP) on an interval $I = [t_0, t_0 + T]$ is:

$\frac{dx}{dt} = f(t, x(t)), \quad x(t_0) = x_0$

with $x: I \to \mathbb{R}^d$ and $f: I \times \mathbb{R}^d \to \mathbb{R}^d$ a vector field.

A solution is a continuously differentiable function $x(t)$ that satisfies the equation everywhere on $I$ . The IVP and the integral equation $x(t) = x_0 + \int_{t_0}^{t} f(s, x(s))\, ds$ are equivalent for continuous $f$ , and the integral form is the more useful object for proofs.

Existence and Uniqueness

Theorem

Picard-Lindelof Existence and Uniqueness

Statement

The IVP $\dot{x} = f(t, x), x(t_0) = x_0$ has a unique solution $x: [t_0, t_0 + \alpha] \to \mathbb{R}^d$ where $\alpha = \min(a, b/M)$ .

Intuition

Lipschitz continuity in $x$ controls how fast trajectories can separate. If two solutions started at the same point could diverge, then locally near that point $f$ would have to behave worse than linearly, contradicting the Lipschitz bound. The condition $\alpha = \min(a, b/M)$ guarantees the trajectory cannot leave the rectangle before time $\alpha$ elapses.

Proof Sketch

Define the Picard iteration on the Banach space $C([t_0, t_0+\alpha], \mathbb{R}^d)$ with the sup norm:

$T[\phi](t) = x_0 + \int_{t_0}^{t} f(s, \phi(s))\, ds.$

Fixed points of $T$ are solutions of the IVP. Verify two facts:

$T$ maps the closed ball $\overline{B}(x_0, b)$ into itself, because $\|T[\phi](t) - x_0\| \leq M \alpha \leq b$ .
$T$ is a contraction on $[t_0, t_0 + \alpha']$ for $\alpha' < 1/L$ : $\|T[\phi] - T[\psi]\|_\infty \leq L \alpha' \|\phi - \psi\|_\infty$ .

By Banach's fixed-point theorem, $T$ has a unique fixed point on the smaller interval. Standard continuation extends to $[t_0, t_0 + \alpha]$ .

Why It Matters

Picard-Lindelof is the gateway theorem. Without it, "the solution to this ODE" is an undefined phrase. Every scientific ML method that integrates a learned vector field $f_\theta$ implicitly assumes $f_\theta$ is Lipschitz, otherwise the trajectory it computes corresponds to nothing well-defined. Most neural networks with smooth activations (tanh, softplus, GELU) satisfy this; ReLU is Lipschitz but not differentiable at the origin, which is why ReLU-based neural ODEs work in practice but require care.

Failure Mode

Lipschitz continuity is sufficient but not necessary. The classical counterexample is $\dot{x} = \sqrt{|x|}, x(0) = 0$ , where $f$ is continuous but only Holder-1/2 continuous in $x$ near the origin. Two distinct solutions exist: $x \equiv 0$ and $x(t) = t^2/4$ . Uniqueness fails precisely because the Lipschitz constant blows up at $x = 0$ . Peano's theorem still gives existence under continuity alone, but uniqueness requires Lipschitz or similar regularity (Osgood's condition is the sharp generalization).

report a correction →

Sensitivity to Initial Conditions: Gronwall

The next question is quantitative: if you perturb the initial condition or the vector field, how far apart do the resulting trajectories drift? The answer is exponential in the Lipschitz constant.

Lemma

Gronwall's Inequality (Differential Form)

Statement

If $u(t) \leq \alpha(t) + \int_{t_0}^{t} \beta(s) u(s)\, ds$ for all $t \in [t_0, T]$ , then:

$u(t) \leq \alpha(t) + \int_{t_0}^{t} \alpha(s) \beta(s) \exp\!\left(\int_s^t \beta(r)\, dr\right) ds.$

When $\alpha(t) \equiv \alpha_0$ is constant, this simplifies to $u(t) \leq \alpha_0 \exp\!\left(\int_{t_0}^{t} \beta(s)\, ds\right)$ .

Intuition

Gronwall converts an integral inequality into a closed-form pointwise bound. Apply it to the difference of two solutions of $\dot{x} = f(t,x)$ with Lipschitz constant $L$ : $\|x_1(t) - x_2(t)\| \leq \|x_1(t_0) - x_2(t_0)\| e^{L(t - t_0)}$ . Two trajectories starting close stay close, but only with an exponential-in-time bound. Over a long horizon with large $L$ , "close" becomes meaningless quickly.

Proof Sketch

Define $v(t) = \int_{t_0}^{t} \beta(s) u(s)\,ds$ , so $v' = \beta u \leq \beta(\alpha + v)$ . This gives $v' - \beta v \leq \beta \alpha$ . Multiply by the integrating factor $e^{-\int_{t_0}^{t} \beta}$ and integrate to obtain a bound on $v(t)$ . Substituting back into $u \leq \alpha + v$ yields the result.

Why It Matters

This bound governs every numerical-stability and adversarial-perturbation argument involving ODE flows. For neural ODEs, $L$ is roughly the operator norm of the Jacobian $\partial f_\theta/\partial h$ . If this is large, gradient computations using the adjoint method accumulate error exponentially in $L \cdot T$ . Lipschitz regularization of $f_\theta$ keeps $L$ small and keeps the system numerically tractable.

Failure Mode

The exponential bound is sharp when $\beta$ is constant, but pessimistic when $\beta(s)$ decays over time. For oscillatory or decaying kernels, tighter bounds (e.g., Bihari-LaSalle for nonlinear growth) may be available. Over long horizons with large $L$ , the Gronwall bound grows so fast that it becomes vacuous; this is why long-time ODE integration and neural ODE training with large depth require Lipschitz control.

report a correction →

Linear Systems

The linear case is the only family of ODEs with a complete closed-form theory, and it provides the local picture for nonlinear systems via linearization.

Definition

Linear Time-Invariant ODE $\overset{x}{˙} = A x$

A linear time-invariant (LTI) system is $\dot{x} = Ax$ with $A \in \mathbb{R}^{d \times d}$ a constant matrix and $x_0 \in \mathbb{R}^d$ . The unique solution is:

$x(t) = e^{At} x_0,$

where $e^{At} = \sum_{k=0}^{\infty} \frac{(At)^k}{k!}$ is the matrix exponential.

The matrix exponential is the central object. Its behavior is determined by the eigenstructure of $A$ : if $A = P \Lambda P^{-1}$ with $\Lambda$ diagonal, then $e^{At} = P\, \mathrm{diag}(e^{\lambda_1 t}, \ldots, e^{\lambda_d t})\, P^{-1}$ . Eigenvalues with positive real part produce exponential growth; negative real part produces decay; pure-imaginary eigenvalues produce oscillation. For non-diagonalizable $A$ , use the Jordan form, which adds polynomial-in- $t$ factors.

For inhomogeneous systems $\dot{x} = Ax + g(t)$ , the variation of parameters formula gives:

$x(t) = e^{A(t - t_0)} x_0 + \int_{t_0}^{t} e^{A(t - s)} g(s)\, ds.$

This is the same template that appears in the Ornstein-Uhlenbeck SDE solution (see stochastic calculus) — replace $g(s)\, ds$ with $\sigma\, dW_s$ and you have the SDE analog.

Lyapunov Stability

For nonlinear systems where closed-form solutions are unavailable, stability is decided by an energy-like function rather than by the solution itself.

Definition

Stability of an Equilibrium

A point $x^* \in \mathbb{R}^d$ with $f(x^*) = 0$ is an equilibrium. It is:

Lyapunov stable if and only if for every $\epsilon > 0$ there exists $\delta > 0$ such that $\|x_0 - x^*\| < \delta$ implies $\|x(t) - x^*\| < \epsilon$ for all $t \geq 0$ .
Asymptotically stable if and only if it is Lyapunov stable and there exists $\delta > 0$ such that $\|x_0 - x^*\| < \delta$ implies $x(t) \to x^*$ as $t \to \infty$ .

Theorem

Lyapunov's Direct Method

Statement

Let $\dot{V}(x) = \nabla V(x)^\top f(x)$ be the time derivative of $V$ along trajectories.

If $\dot{V}(x) \leq 0$ on $D$ , then $x^* = 0$ is Lyapunov stable.
If $\dot{V}(x) < 0$ on $D \setminus \{0\}$ , then $x^* = 0$ is asymptotically stable.

Intuition

Treat $V$ as a generalized energy. If trajectories cannot increase $V$ , they cannot escape the level sets $\{V \leq c\}$ , so they remain near the equilibrium. If $V$ strictly decreases, every trajectory eventually reaches the bottom, which is the equilibrium.

Proof Sketch

For Lyapunov stability: given $\epsilon > 0$ , choose $c$ small enough that $\{V \leq c\} \cap D \subseteq B(0, \epsilon)$ . Pick $\delta$ such that $V(x) < c$ for $\|x\| < \delta$ . Trajectories starting in $B(0, \delta)$ keep $V \leq c$ and therefore stay in $B(0, \epsilon)$ .

For asymptotic stability: $V(x(t))$ is non-increasing and bounded below, so it converges to some $V_\infty \geq 0$ . If $V_\infty > 0$ , then $x(t)$ stays in a compact set where $\dot{V} < 0$ uniformly, contradicting convergence of $V$ . Hence $V_\infty = 0$ and $x(t) \to 0$ .

Why It Matters

Lyapunov is how stability is proven without solving the ODE. In control theory and reinforcement learning, the same machinery underlies stability of feedback policies. In ML training, treating $V(\theta) = \mathcal{L}(\theta) - \mathcal{L}^*$ and analyzing $\dot{V}$ under gradient flow is the standard convergence-proof template (see stability and optimization dynamics).

Failure Mode

Constructing $V$ is an art. There is no algorithm for finding a Lyapunov function for a generic nonlinear system; the negative result of Lyapunov's method is uninformative because it might just mean you picked the wrong $V$ . For LTI systems $\dot{x} = Ax$ , $V(x) = x^\top P x$ with $P$ solving the Lyapunov equation $A^\top P + PA = -Q$ (any $Q \succ 0$ ) works iff all eigenvalues of $A$ have negative real part.

report a correction →

Numerical Methods

Closed-form solutions are unavailable for almost all nonlinear ODEs, so simulation requires discretization. A numerical method computes an approximation $\tilde{x}_n \approx x(t_n)$ on a grid $t_0 < t_1 < \cdots < t_N$ .

Euler's Method (1st Order)

The simplest scheme is the forward Euler update:

$\tilde{x}_{n+1} = \tilde{x}_n + h \, f(t_n, \tilde{x}_n),$

with step size $h = t_{n+1} - t_n$ . This is the first-order Taylor expansion of $x(t_{n+1})$ around $t_n$ . The local truncation error per step is $O(h^2)$ and the global error after integrating to a fixed time $T$ is $O(h)$ . Forward Euler is what a ResNet update computes; it is also the default when reasoning about gradient flow in ML.

Forward Euler is conditionally stable: for the linear test problem $\dot{x} = \lambda x$ with $\lambda < 0$ , it requires $h < 2/|\lambda|$ to avoid blow-up. Backward Euler ( $\tilde{x}_{n+1} = \tilde{x}_n + h f(t_{n+1}, \tilde{x}_{n+1})$ , an implicit scheme) is unconditionally stable for stiff linear systems but requires solving a nonlinear equation each step.

Runge-Kutta (4th Order)

The classical RK4 method uses four function evaluations per step:

\begin{aligned} k_1 &= f(t_n, \tilde{x}_n), \\ k_2 &= f(t_n + h/2, \tilde{x}_n + h k_1/2), \\ k_3 &= f(t_n + h/2, \tilde{x}_n + h k_2/2), \\ k_4 &= f(t_n + h, \tilde{x}_n + h k_3), \\ \tilde{x}_{n+1} &= \tilde{x}_n + \frac{h}{6}(k_1 + 2k_2 + 2k_3 + k_4). \end{aligned}

Local truncation error is $O(h^5)$ , global error is $O(h^4)$ . Four evaluations per step is the price for the higher order; for smooth right-hand sides, the per-accuracy compute beats Euler badly.

Adaptive Step Size: Dormand-Prince

Fixed step sizes either waste compute on smooth regions or under-resolve sharp regions. Adaptive solvers compute two estimates of $\tilde{x}_{n+1}$ at different orders, use the difference as a local-error estimate, and choose the next step size to keep error within tolerance.

The standard implementation in scipy.integrate.solve_ivp and torchdiffeq is Dormand-Prince RK45 (Dormand and Prince 1980): a 7-stage scheme that produces both a 5th-order and an embedded 4th-order estimate. Step rejection happens when the difference exceeds tolerance; otherwise the step is accepted and the next $h$ is chosen as $h_{\text{new}} = h \cdot (\mathrm{tol} / \mathrm{err})^{1/5}$ (with safety factors). This is what neural ODE solvers use by default.

For stiff problems (eigenvalue ratio $> 10^3$ ), explicit methods like RK45 force tiny step sizes. Implicit solvers (BDF methods, Rosenbrock methods) are necessary; they pay per-step cost for solving a nonlinear system in exchange for stability.

Common Confusions

Watch Out

Lipschitz is not the same as differentiable

A function can be Lipschitz without being differentiable (ReLU, $|x|$ ). It can be differentiable without being Lipschitz ( $x^2$ on $\mathbb{R}$ is differentiable but the Lipschitz constant is unbounded). Picard-Lindelof requires Lipschitz, not differentiability. In practice, $C^1$ functions on a compact set are automatically Lipschitz on that set with constant $\sup \|\nabla f\|$ , which is why " $f$ is smooth on a bounded region" is the usual sufficient condition cited in ML papers.

Watch Out

Existence and uniqueness are local guarantees

Picard-Lindelof gives a solution on $[t_0, t_0 + \alpha]$ for some possibly small $\alpha$ . The solution may blow up in finite time even with smooth $f$ . Example: $\dot{x} = x^2, x(0) = 1$ has solution $x(t) = 1/(1 - t)$ , which diverges at $t = 1$ . Global existence requires extra assumptions: linear growth of $f$ , dissipativity, or a Lyapunov function bounding solutions away from infinity.

Watch Out

Numerical convergence is not the same as physical correctness

A numerical solver converges to the true ODE solution as $h \to 0$ . It does not check whether the ODE itself models reality correctly. For chaotic systems (Lorenz, double pendulum), individual trajectories are exponentially sensitive to perturbations, so any finite-precision solver eventually diverges from the "true" trajectory regardless of order. Statistical properties of the trajectory (invariant measure, Lyapunov exponents) may still be computable, but the trajectory itself becomes meaningless past a horizon set by the largest Lyapunov exponent.

Exercises

ExerciseCore

Problem

Consider the IVP $\dot{x} = -x^3, x(0) = 1$ on $t \in [0, 10]$ .

Verify that $f(x) = -x^3$ is Lipschitz on the bounded interval $[-2, 2]$ , and compute the Lipschitz constant.
Write down the Picard iteration starting from $\phi_0(t) \equiv 1$ and compute $\phi_1(t)$ explicitly.
Show that $V(x) = x^2/2$ is a Lyapunov function for the equilibrium $x^* = 0$ .

ExerciseAdvanced

Problem

Consider the linear system $\dot{x} = Ax$ with $A = \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix}$ .

Compute the eigenvalues of $A$ and conclude what kind of stability the origin has.
Find a quadratic Lyapunov function $V(x) = x^\top P x$ certifying Lyapunov stability (but not asymptotic stability) of the origin.
Apply forward Euler with step size $h$ to this system. Show that $\|\tilde{x}_n\|^2 = (1 + h^2)^n \|\tilde{x}_0\|^2$ , so Euler grows the energy and is unstable for any $h > 0$ .

References

Foundational:

Hirsch, Smale, Devaney, Differential Equations, Dynamical Systems, and an Introduction to Chaos (3rd ed., 2013), Chapters 1-2 (existence, uniqueness), Chapter 6 (linear systems). The standard graduate-level introduction.
Coddington and Levinson, Theory of Ordinary Differential Equations (1955), Chapters 1-2. Classical reference for Picard-Lindelof and continuation theorems.
Hartman, Ordinary Differential Equations (2nd ed., 1982), Chapter II. Definitive treatment of existence and uniqueness, including Osgood's condition.

Stability:

Khalil, Nonlinear Systems (3rd ed., 2002), Chapters 4 and 14. The reference for Lyapunov analysis and applications in control.
Vidyasagar, Nonlinear Systems Analysis (2nd ed., 1993), Chapter 5. Complementary treatment with more invariance-principle material.

Numerical methods:

Hairer, Norsett, Wanner, Solving Ordinary Differential Equations I: Nonstiff Problems (2nd ed., 1993). Standard reference for Runge-Kutta methods and adaptive control.
Hairer and Wanner, Solving Ordinary Differential Equations II: Stiff and Differential-Algebraic Problems (2nd ed., 1996). Stiff solvers, BDF methods.
Suli and Mayers, An Introduction to Numerical Analysis (2003), Chapter 12. Compact undergraduate-level treatment.
Dormand and Prince, "A family of embedded Runge-Kutta formulae," Journal of Computational and Applied Mathematics 6(1):19-26 (1980). The original RK45 paper.

Next Topics

Physics-informed neural networks: parameterize the solution $u_\theta$ and enforce the PDE through the loss
Neural ODEs: parameterize the dynamics $f_\theta$ and let an ODE solver compute the trajectory
Stochastic calculus for ML: extend ODE theory to processes driven by Brownian motion (the SDE generalization)

Last reviewed: April 17, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

2

Continuity in Rⁿlayer 0A · tier 1
The Jacobian Matrixlayer 0A · tier 1

Derived topics

5

Adjoint Sensitivity Methodlayer 3 · tier 2
Physics-Informed Neural Networkslayer 4 · tier 2
Stochastic Calculus for MLlayer 3 · tier 3
Neural ODEs and Continuous-Depth Networkslayer 4 · tier 3
Nonlinear Dynamics and Chaos Fundamentalslayer 4 · tier 3

Graph-backed continuations

Physics-Informed Neural Networks Neural ODEs and Continuous-Depth Networks Stochastic Calculus for ML Adjoint Sensitivity Method Nonlinear Dynamics and Chaos Fundamentals