Foundations
Formal Verification and Proof Assistants
Some behaviors should be proven, not merely hoped for. Model checking, theorem proving, Lean/Coq, verified compilers, and why verification matters for infrastructure, protocols, and safety-critical AI.
Prerequisites
Why This Matters
Testing can show the presence of bugs but never their absence. For safety-critical systems (aircraft control, medical devices, cryptographic protocols, financial infrastructure), "it worked on the test cases" is not sufficient. Formal verification provides mathematical proof that a system satisfies its specification for ALL possible inputs, not just the ones you tested.
The connection to ML: as AI systems are deployed in high-stakes domains, the question shifts from "does it work on the benchmark?" to "can we prove it satisfies certain properties?" Verified neural network properties (robustness certificates, output bounds), formal specifications for RL agents, and machine-checked proofs of mathematical theorems are active research areas where formal methods intersect with ML.
Proof assistants (Lean, Coq, Isabelle) also matter for mathematics itself. The Kepler conjecture was proved with computer-assisted verification (Hales, Flyspeck project). Large language models are now being trained to generate formal proofs, with Lean 4 as the primary target language. Understanding what formal verification can and cannot do is necessary context for this line of research.
Prerequisites
This page assumes familiarity with basic logic and proof techniques (propositional logic, predicate logic, induction) and type theory (types as propositions, the Curry-Howard correspondence).
Two Paradigms: Model Checking and Theorem Proving
Model Checking
Model checking is an automated verification technique that exhaustively explores the state space of a finite-state system to check whether a property (expressed in temporal logic) holds. Given a model (a finite-state transition system) and a specification (a temporal logic formula), the model checker determines whether (the model satisfies the specification).
The inputs are:
- A model: a finite-state transition system with states , initial states , transition relation , and labeling function mapping states to sets of atomic propositions
- A specification: a formula in temporal logic (CTL, LTL, CTL*)
The output is either "yes" (with a proof) or "no" (with a counterexample trace).
Theorem Proving (Interactive)
Interactive theorem proving uses a proof assistant (Lean, Coq, Isabelle/HOL) to construct machine-checked proofs of mathematical statements. The human provides the proof strategy; the proof assistant checks every logical step. The result is a formal proof object that can be independently verified by a small trusted kernel.
Unlike model checking, theorem proving handles infinite-state systems and arbitrary mathematical statements. The cost: proofs require significant human effort (or, increasingly, AI assistance).
Temporal Logic (CTL and LTL)
Temporal logic extends propositional logic with operators that express properties over time:
-
LTL (Linear Temporal Logic): reasons about single execution paths.
- : "always " (safety: something bad never happens)
- : "eventually " (liveness: something good eventually happens)
- : " holds until holds"
-
CTL (Computation Tree Logic): reasons about branching computation trees.
- : "for all paths, always "
- : "there exists a path where eventually "
- : "for all paths, eventually "
LTL specifications express properties of traces. CTL specifications express properties of the full computation tree.
The Curry-Howard Correspondence
The bridge between logic and programming. Under the Curry-Howard correspondence:
| Logic | Programming |
|---|---|
| Propositions | Types |
| Proofs | Programs (terms) |
| Implication | Function type |
| Conjunction | Product type |
| Disjunction | Sum type |
| Universal | Dependent function |
| Existential | Dependent pair |
A proof of is a function that takes a proof of and produces a proof of . This is not a metaphor: in Lean 4, the statement theorem foo : P -> Q declares a function from proofs of to proofs of . Writing the function body IS the proof.
This correspondence is the foundation of all modern proof assistants. It means that type checking (which compilers do automatically) is proof checking. A well-typed program in a dependently typed language is a verified proof.
Main Theorems
Rice's Theorem (Verification is Generally Undecidable)
Statement
Let be any nontrivial semantic property of programs (i.e., a property of the function computed by the program, not of the program text). Then it is undecidable whether an arbitrary program satisfies .
Formally: let be a nontrivial set of partial recursive functions. Then the set is not recursive, where is the function computed by program .
Intuition
If you could decide any nontrivial property of programs, you could in particular decide whether a program halts (by checking the property "computes a total function"). Since the halting problem is undecidable, no nontrivial property can be decidable in general.
This does not mean verification is impossible. It means there is no single algorithm that correctly verifies ALL programs. For specific programs with specific structures (finite-state systems, programs with bounded loops, programs that terminate by construction), verification is often feasible.
Proof Sketch
By reduction from the halting problem. Given a Turing machine and input , construct a program such that if and only if halts on . Specifically: let and . Define to simulate on ; if halts, computes ; otherwise, computes (or diverges). If we could decide membership in , we could decide the halting problem. Contradiction.
Why It Matters
Rice's theorem sets the fundamental limits of verification. It explains why we need both automated tools (model checking for finite-state systems) and human-guided proof (theorem proving for infinite-state systems). It also explains why verification tools use abstractions, approximations, and restricted program models: they sidestep the undecidability by working with subsets of programs where the property IS decidable.
For ML: verifying that a neural network satisfies a property (e.g., robustness to perturbations) is undecidable in general for ReLU networks with arbitrary architectures. Practical verifiers work on specific architectures with bounded depth and width.
Failure Mode
The theorem applies to Turing-complete languages. For restricted computational models (finite automata, pushdown automata, bounded programs), many properties ARE decidable. Model checking exploits this: by restricting to finite-state systems, exhaustive state exploration becomes possible (though potentially expensive).
CTL Model Checking Complexity
Statement
The CTL model checking problem (given and , does ?) can be solved in time . That is, model checking is linear in the size of the specification and linear in the size of the model.
For LTL model checking, the complexity is : linear in the model size but exponential in the specification length.
Intuition
CTL model checking works by labeling states bottom-up through the formula structure. For each subformula, you mark which states satisfy it. The fixpoint computations (for operators like and ) require traversing the transition relation, taking per subformula. The total work is proportional to the number of subformulas times the model size.
LTL model checking is harder because LTL specifications can express properties that require tracking the full execution path, not just the current state. The specification is compiled into a Buchi automaton (potentially exponentially larger), and the model is composed with this automaton.
Proof Sketch
For CTL: by structural induction on . Atomic propositions are checked in by the labeling function. Boolean connectives are checked in . The temporal operators , , and require fixpoint computation. labels all states with a successor satisfying : one pass over , costing . uses a backward reachability computation from states satisfying , costing . uses a greatest fixpoint computation on the subgraph of states satisfying , also .
The full proof is in Clarke, Grumberg, and Peled, Model Checking, Chapter 4.
Why It Matters
The linear complexity (in the model) is what makes model checking practical. The exponential part is in the specification size, which is typically small (tens of operators). The model can have billions of states and still be checkable, especially with symbolic model checking (using BDDs or SAT solvers to represent state sets compactly).
This result underpins the practical success of model checking in hardware verification (Intel uses model checking for chip design), protocol verification (TLS, consensus algorithms), and software verification (SLAM, CEGAR).
Failure Mode
The state explosion problem: the model size can be exponential in the number of system components. A system with boolean variables has states. Concurrent systems with processes, each with states, have states. Symbolic methods (BDDs, SAT-based BMC) and abstraction (CEGAR) mitigate but do not eliminate this problem.
Proof Assistants in Practice
Lean 4
Lean 4 is both a programming language and a proof assistant. Its type system is based on the Calculus of Inductive Constructions (CIC). Key features:
- Dependent types: types can depend on values, enabling precise specifications
- Tactic mode: proofs can be written as sequences of proof tactics (automated proof steps)
- Metaprogramming: tactics are written in Lean itself, making the system extensible
- Mathlib: a large mathematical library (hundreds of thousands of theorems)
Lean is the primary target for AI-assisted theorem proving (DeepMind's AlphaProof, Meta's work on formal mathematics).
Coq
Coq is the oldest widely used proof assistant based on CIC. CompCert (a verified C compiler) and CertiKOS (a verified OS kernel) were developed in Coq. Its tactic language (Ltac) is mature but less ergonomic than Lean 4's.
Isabelle/HOL
Isabelle uses higher-order logic (simpler than CIC) and has strong automation. The seL4 verified microkernel was developed in Isabelle.
Verified Systems
CompCert (Leroy, 2006): a C compiler verified in Coq to preserve the semantics of every compiled program. For any source program and compiled program : if has well-defined behavior, then has the same observable behavior. The proof covers all optimization passes. This means compiler bugs cannot cause verified-source programs to misbehave.
seL4 (Klein et al., 2009): a microkernel verified in Isabelle/HOL. The proof establishes that the C implementation correctly implements the abstract specification, which includes memory isolation, capability-based access control, and interrupt handling. The verification took approximately 20 person-years.
Flyspeck (Hales et al., 2017): a formal proof of the Kepler conjecture (optimal sphere packing in 3D). The original proof by Hales (1998) relied on extensive computation that referees could not fully check. The Flyspeck project formalized the entire proof in HOL Light, producing a machine-verified certificate.
Connections to ML
Verified neural network properties. Given a neural network and an input region , verify that for all . For example: prove that a classifier's output does not change for any perturbation within an ball (adversarial robustness). Tools like alpha-beta-CROWN and MN-BaB use bound propagation and branch-and-bound to verify ReLU networks. These are sound (a verified property truly holds) but incomplete (they may fail to verify a property that holds).
Autoformalization. Use language models to translate informal mathematical proofs into formal proofs in Lean or Coq. This is an active research area: if successful, it would dramatically accelerate formalization of mathematics and enable AI systems to produce verifiable mathematical reasoning.
Agent specifications. For an RL agent deployed in a safety-critical setting, specify properties like "the agent never takes action X in state Y" as temporal logic formulas and verify them against the agent's policy (if the policy is a finite-state controller) or check them empirically with formal guarantees (if the policy is a neural network with verified bounds).
Common Confusions
Formal verification is not testing
Testing checks specific inputs. Verification proves a property for all inputs (or all reachable states). A test suite with 99.9% coverage can miss the one edge case that causes a disaster. Verification guarantees no such edge case exists, relative to the formal specification.
Verification does not mean the system is correct in every sense
Verification proves that the implementation matches the specification. If the specification is wrong, the system is verified but still does the wrong thing. The specification problem (formally capturing what you actually want) is the formal-methods analog of the reward design problem in RL.
Rice's theorem does not make verification useless
Rice's theorem says you cannot verify arbitrary programs for arbitrary properties. It does not say you cannot verify specific programs for specific properties. Model checking works for finite-state systems. Theorem proving works for programs where humans (or AI) can construct proofs. Abstraction and approximation handle many intermediate cases. The practical question is always: can we verify THIS system for THIS property at acceptable cost?
Proof assistants do not generate proofs automatically
Proof assistants check proofs, not find them. The human (or AI) provides the proof structure, and the assistant verifies that every step is logically valid. Some automation exists (tactic solvers, SMT integration), but for nontrivial theorems, significant human guidance is required. The gap between "the theorem is true" and "we have a machine-checked proof" can be years of effort.
Key Takeaways
- Model checking exhaustively explores finite-state systems; CTL checking is linear in model size, exponential in specification size
- Theorem proving constructs machine-checked proofs for arbitrary mathematical statements, including properties of infinite-state systems
- The Curry-Howard correspondence equates proofs with programs, enabling proof assistants based on type theory
- Rice's theorem sets fundamental limits: general verification is undecidable, but specific verification is often feasible
- CompCert (verified compiler) and seL4 (verified microkernel) are landmark verified systems
- Neural network verification, autoformalization, and agent specification are active ML-formal methods intersections
- Verification proves implementation matches specification; it does not prove the specification is correct
Exercises
Problem
A traffic light controller has states (red, yellow, green) for two directions (North-South and East-West), giving combined states. Write a CTL safety property stating that both directions are never green simultaneously. How many states does the model checker need to explore?
Problem
In Lean 4, the statement theorem add_comm (n m : Nat) : n + m = m + n declares that natural number addition is commutative. Explain, using the Curry-Howard correspondence, what this declaration asks you to produce.
Problem
A ReLU neural network defines a piecewise linear function. For a given input and perturbation radius , the robustness verification problem asks: is for all with ? Explain why this is NP-hard in general and describe the approach used by bound-propagation verifiers.
References
Canonical:
- Clarke, Grumberg, Peled, Model Checking (2nd ed., 2018), Chapters 1-4
- Baier & Katoen, Principles of Model Checking (2008), Chapters 5-6
- Pierce, Software Foundations (online textbook for Coq-based verification)
Verified Systems:
- Leroy, "Formal Verification of a Realistic Compiler" (CompCert, CACM 2009)
- Klein et al., "seL4: Formal Verification of an OS Kernel" (SOSP 2009)
ML Connections:
- Katz et al., "Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks" (CAV 2017)
- First, Rabe, Bansal, "Baldur: Whole-Proof Generation and Repair with Large Language Models" (2023)
- Wu et al., "An Integer Linear Programming Framework for Mining Constraints from Data" (verified ML properties, 2022)
Next Topics
- Computability theory: the theoretical foundations underlying Rice's theorem and the limits of automated reasoning
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Basic Logic and Proof TechniquesLayer 0A
- Type TheoryLayer 0A
- Lambda CalculusLayer 0A