Why This Matters
SAT (Boolean satisfiability) is the canonical NP-complete problem: given a Boolean formula, is there an assignment of variables that makes it true? Despite worst-case exponential complexity, modern SAT solvers handle formulas with millions of variables in practice. SMT (Satisfiability Modulo Theories) extends SAT with domain-specific reasoning about integers, reals, arrays, and bitvectors.
These solvers matter for AI and ML in two ways. First, they are the backbone of formal verification: checking that a neural network satisfies safety properties, verifying that a program meets its specification, or proving that a hardware design is correct. Second, as AI agents become more capable, SAT/SMT solvers serve as reliable "reasoning backends" that can verify intermediate claims, check consistency of plans, and discharge arithmetic obligations. Solvers and LLMs are complementary: LLMs generate plausible reasoning, solvers verify it.
Prerequisites
This page assumes familiarity with P vs NP and the concept of NP-completeness.
Core Definitions
SAT (Boolean Satisfiability Problem)
Given a Boolean formula over variables (using , , ), the SAT problem asks: does there exist an assignment (with ) such that evaluates to true?
A formula is satisfiable if and only if such an assignment exists, and unsatisfiable otherwise.
Conjunctive Normal Form (CNF)
A formula is in CNF if and only if it is a conjunction of clauses, where each clause is a disjunction of literals (variables or their negations):
Every Boolean formula can be converted to an equisatisfiable CNF formula (with possible introduction of auxiliary variables). SAT solvers typically operate on CNF.
Tseitin Transformation
Naive distribution of over can produce a CNF formula whose size is exponential in the size of the original formula. The Tseitin transformation (Tseitin 1968) avoids this blowup: introduce a fresh auxiliary variable for every subformula , add clauses asserting expressed locally in terms of the children's auxiliary variables, and conjoin for the root. The resulting CNF has size linear in the original formula. The transformation preserves satisfiability, not logical equivalence: the new formula has extra variables and does not describe the same set of models over the original alphabet.
SMT (Satisfiability Modulo Theories)
An SMT problem asks whether a first-order formula is satisfiable with respect to a background theory . Common theories include:
- Linear integer arithmetic (LIA):
- Linear real arithmetic (LRA):
- Arrays: read/write operations with and
- Bitvectors: fixed-width binary arithmetic (32-bit integers, overflow)
- Uninterpreted functions: (congruence; the converse is injectivity and is not assumed)
SMT solvers combine SAT solving with theory-specific decision procedures.
Main Theorems
Cook-Levin Theorem
Statement
SAT is NP-complete. That is:
- SAT is in NP: given a satisfying assignment, it can be verified in polynomial time.
- Every problem in NP can be reduced to SAT in polynomial time.
Intuition
The computation of any polynomial-time verifier can be encoded as a Boolean formula. Each step of the computation, each cell of the tape, each state of the machine becomes a Boolean variable. The formula asserts "the machine accepts," and its satisfiability is equivalent to the original NP problem having a solution.
Proof Sketch
Let with verifier running in time . Encode 's computation as a Boolean formula:
- Create variables for "machine is in state at time ."
- Create variables for "tape cell contains symbol at time ."
- Create variables for "head is at position at time ."
- Add clauses encoding: (a) the initial configuration matches the input, (b) each step follows the transition function, (c) the machine is in an accepting state at the final step.
The formula has variables and clauses (polynomial in ), and it is satisfiable if and only if accepts some certificate. The satisfying assignment encodes the certificate.
Why It Matters
Cook-Levin established the theory of NP-completeness. Once SAT is NP-complete, every new problem can be shown NP-complete by reduction from SAT (or from any other known NP-complete problem). This created a practical methodology: to show a problem is hard, reduce SAT to it.
It also explains why SAT solvers are so broadly useful: any problem in NP can be encoded as SAT. Scheduling, planning, verification, and combinatorial optimization all reduce to SAT.
Failure Mode
NP-completeness is a worst-case notion. SAT is NP-complete in the worst case, but many practical instances have exploitable structure (locality, symmetry, small clause width) that modern solvers exploit. The gap between worst-case theory and practical performance is enormous for SAT.
SAT Solving Algorithms
DPLL (Davis-Putnam-Logemann-Loveland, 1962):
- Unit propagation: if a clause has only one unassigned literal, set it to true.
- Pure literal elimination: if a variable appears only positively (or only negatively), set it accordingly.
- Branching: pick an unassigned variable, try true, then try false. Backtrack on contradiction.
CDCL (Conflict-Driven Clause Learning, 1996): CDCL extends DPLL with two key innovations:
- Conflict analysis: when a contradiction is reached, analyze why it occurred. Identify a set of variable assignments that caused the conflict.
- Learned clauses: add a new clause to the formula that prevents the same conflict from recurring. This prunes the search space.
- Non-chronological backtracking: jump back to the decision level that caused the conflict, not just one level up.
CDCL is the basis of all competitive modern SAT solvers (MiniSat, CaDiCaL, Kissat). On industrial benchmarks, CDCL solvers routinely handle formulas with millions of variables and clauses.
SMT Architecture
Modern SMT solvers (Z3, CVC5, Yices) use the DPLL(T) architecture:
- Abstract the formula: replace theory atoms () with Boolean variables ().
- Use a SAT solver (CDCL) on the abstracted Boolean formula.
- When the SAT solver finds a satisfying assignment, pass the corresponding theory atoms to a theory solver.
- If the theory solver says the atoms are inconsistent (e.g., and simultaneously), generate a conflict clause and feed it back to the SAT solver.
- Repeat until SAT with a theory-consistent assignment, or UNSAT.
This separation of concerns keeps the architecture clean: the SAT solver handles Boolean structure, the theory solvers handle domain reasoning, and they communicate through conflict clauses.
Nelson-Oppen Theory Combination
Real SMT queries mix theories: a verification condition might touch linear arithmetic, arrays, and uninterpreted functions in the same formula. The Nelson-Oppen method (Nelson and Oppen 1979) combines decision procedures for disjoint, stably-infinite theories into a decision procedure for their union. The input is first purified: each atom is rewritten so it uses symbols from a single theory, with shared variables at the interfaces. The individual theory solvers then exchange equalities between shared variables; a combined model exists if and only if each theory agrees on the induced partition of the shared variables. Modern DPLL(T) frameworks (Nieuwenhuis, Oliveras, and Tinelli 2006) integrate Nelson-Oppen with lazy SMT, so theory propagation and conflict analysis span multiple theories inside the CDCL loop.
Phase Transitions in Random -SAT
Random -SAT instances are generated by sampling clauses uniformly at random over variables, each clause a disjunction of literals. Empirically, the probability that a random 3-SAT formula is satisfiable undergoes a sharp transition as the clause-to-variable ratio crosses a critical value near . For almost all instances are satisfiable and easy; for almost all are unsatisfiable and easy to refute. Instances near are the hardest: solver runtime peaks sharply around the threshold. Friedgut (1999) proved that a sharp satisfiability threshold exists for random -SAT with , but the exact location of the 3-SAT threshold remains an open problem; tight upper and lower bounds have been established and the sequence of physics-inspired and rigorous bounds has steadily narrowed the window. The phase-transition picture guides benchmark generation and explains why average-case complexity is not a useful summary of SAT difficulty.
Why Solvers Complement LLMs
LLMs and SAT/SMT solvers have complementary strengths:
| Capability | LLM | SAT/SMT solver |
|---|---|---|
| Flexibility | High: handles natural language, ambiguous specifications | Low: requires formal specification |
| Correctness guarantee | None: can hallucinate | Complete: if it says UNSAT, the formula is unsatisfiable |
| Arithmetic | Unreliable for multi-step computation | Exact |
| Search | Beam search, sampling | Exhaustive with pruning |
For AI agents, the natural architecture is: use the LLM to translate informal requirements into formal constraints, then use a solver to check satisfiability or find solutions. The LLM provides flexibility; the solver provides correctness.
Common Confusions
NP-complete does not mean unsolvable
SAT is NP-complete, but modern solvers routinely solve practical instances with millions of variables. NP-completeness is a worst-case classification. Practical instances often have structure (small treewidth, clustering, phase transitions) that solvers exploit. The average-case and structured-case complexity of SAT is much better than the worst case.
SMT is not just SAT with extra steps
SMT solvers do not simply convert everything to SAT. They use specialized decision procedures for each theory (simplex for linear arithmetic, congruence closure for uninterpreted functions, etc.). The DPLL(T) architecture interleaves SAT solving with theory reasoning, which is more efficient than a pure SAT encoding for most theory-heavy problems.
Solvers are not obsolete because of LLMs
LLMs cannot replace SAT/SMT solvers. LLMs have no mechanism for guaranteed correct reasoning over formal constraints. A solver that returns UNSAT has proved unsatisfiability by exhaustive search. No LLM can provide this guarantee. The two tools serve different roles and are most powerful when combined.
Exercises
Problem
Convert the following Boolean formula into CNF and determine if it is satisfiable: .
Problem
Explain why conflict-driven clause learning (CDCL) is more efficient than naive DPLL on structured problems. What information does a learned clause encode?
Problem
Sketch how you would use an SMT solver to verify that a ReLU neural network with one hidden layer always outputs a non-negative value on inputs in .
Neural Network Verification. State of the Art
Reluplex (Katz, Barrett, Dill, Julian, and Kochenderfer 2017, arXiv:1702.01135) introduced a Simplex-based decision procedure for piecewise-linear ReLU constraints and remains the foundational instance of SMT-style NN verification. It was superseded by Marabou (Katz, Huang, Ibeling, Julian, Lazarus, Lim, Shah, Thakoor, Wu, Zeljić, Dill, Kochenderfer, and Barrett 2019, arXiv:1910.14322), which extends Reluplex with support for arbitrary piecewise-linear activations, richer property specifications, and parallel search. The current state of the art on standard robustness benchmarks is -CROWN (Xu, Zhang, Wang, Wang, Jana, Lin, and Hsieh 2021, arXiv:2103.06624, with ongoing extensions through 2024), which combines branch-and-bound with the CROWN family of linear-relaxation bound-propagation techniques and has won successive iterations of the VNN-COMP competition. The practical takeaway is that exact SMT-style verification scales to small networks and local robustness queries, while branch-and-bound with tight linear relaxations now dominates on realistic vision models.
Further directions
- VSIDS variable-ordering heuristic in CDCL
- Proof production and DRAT unsat certificates
- Parallel SAT solving: portfolio solvers and cube-and-conquer
- QBF (PSPACE-complete) and beyond
- #SAT model counting (harder than decision, ApproxMC)
- Craig interpolation from UNSAT proofs
- Abstract interpretation connection for program verification
- Quiz
References
Canonical:
- Cook, "The Complexity of Theorem-Proving Procedures" (1971)
- Biere, Heule, van Maaren, Walsh, Handbook of Satisfiability (2nd ed., 2021)
- Tseitin, "On the Complexity of Derivation in Propositional Calculus" (1968)
Solvers:
- de Moura & Bjorner, "Z3: An Efficient SMT Solver" (2008)
- Marques-Silva, Lynce, Malik, "Conflict-Driven Clause Learning SAT Solvers" (2009)
Theory combination and DPLL(T):
- Nelson & Oppen, "Simplification by Cooperating Decision Procedures," TOPLAS 1(2):245-257 (1979)
- Nieuwenhuis, Oliveras, Tinelli, "Solving SAT and SAT Modulo Theories," JACM 53(6):937-977 (2006)
Phase transitions:
- Friedgut, "Sharp Thresholds of Graph Properties, and the -SAT Problem," JAMS 12:1017-1054 (1999)
Neural network verification:
- Katz, Barrett, Dill, Julian, Kochenderfer, "Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks," arXiv:1702.01135 (2017)
- Katz, Huang, Ibeling, Julian, Lazarus, Lim, Shah, Thakoor, Wu, Zeljić, Dill, Kochenderfer, Barrett, "The Marabou Framework for Verification and Analysis of Deep Neural Networks," arXiv:1910.14322 (2019)
- Xu, Zhang, Wang, Wang, Jana, Lin, Hsieh, "Fast and Complete: Enabling Complete Neural Network Verification with Rapid and Massively Parallel Incomplete Verifiers," arXiv:2103.06624 (2021)
Next Topics
- Model theory basics: the semantic side of logic (truth in structures vs. provability)
- Proof theory and cut-elimination: the structural theory behind proof search