Foundations
SAT, SMT, and Automated Reasoning
SAT solvers decide Boolean satisfiability (NP-complete). SMT solvers extend SAT with theories like arithmetic and arrays. These tools verify constraints, discharge proof obligations, and complement LLMs in AI agent pipelines.
Prerequisites
Why This Matters
SAT (Boolean satisfiability) is the canonical NP-complete problem: given a Boolean formula, is there an assignment of variables that makes it true? Despite worst-case exponential complexity, modern SAT solvers handle formulas with millions of variables in practice. SMT (Satisfiability Modulo Theories) extends SAT with domain-specific reasoning about integers, reals, arrays, and bitvectors.
These solvers matter for AI and ML in two ways. First, they are the backbone of formal verification: checking that a neural network satisfies safety properties, verifying that a program meets its specification, or proving that a hardware design is correct. Second, as AI agents become more capable, SAT/SMT solvers serve as reliable "reasoning backends" that can verify intermediate claims, check consistency of plans, and discharge arithmetic obligations. Solvers and LLMs are complementary: LLMs generate plausible reasoning, solvers verify it.
Prerequisites
This page assumes familiarity with P vs NP and the concept of NP-completeness.
Core Definitions
SAT (Boolean Satisfiability Problem)
Given a Boolean formula over variables (using , , ), the SAT problem asks: does there exist an assignment (with ) such that evaluates to true?
A formula is satisfiable if such an assignment exists, and unsatisfiable otherwise.
Conjunctive Normal Form (CNF)
A formula is in CNF if it is a conjunction of clauses, where each clause is a disjunction of literals (variables or their negations):
Every Boolean formula can be converted to an equisatisfiable CNF formula (with possible introduction of auxiliary variables). SAT solvers typically operate on CNF.
SMT (Satisfiability Modulo Theories)
An SMT problem asks whether a first-order formula is satisfiable with respect to a background theory . Common theories include:
- Linear integer arithmetic (LIA):
- Linear real arithmetic (LRA):
- Arrays: read/write operations with and
- Bitvectors: fixed-width binary arithmetic (32-bit integers, overflow)
- Uninterpreted functions: (congruence)
SMT solvers combine SAT solving with theory-specific decision procedures.
Main Theorems
Cook-Levin Theorem
Statement
SAT is NP-complete. That is:
- SAT is in NP: given a satisfying assignment, it can be verified in polynomial time.
- Every problem in NP can be reduced to SAT in polynomial time.
Intuition
The computation of any polynomial-time verifier can be encoded as a Boolean formula. Each step of the computation, each cell of the tape, each state of the machine becomes a Boolean variable. The formula asserts "the machine accepts," and its satisfiability is equivalent to the original NP problem having a solution.
Proof Sketch
Let with verifier running in time . Encode 's computation as a Boolean formula:
- Create variables for "machine is in state at time ."
- Create variables for "tape cell contains symbol at time ."
- Create variables for "head is at position at time ."
- Add clauses encoding: (a) the initial configuration matches the input, (b) each step follows the transition function, (c) the machine is in an accepting state at the final step.
The formula has variables and clauses (polynomial in ), and it is satisfiable if and only if accepts some certificate. The satisfying assignment encodes the certificate.
Why It Matters
Cook-Levin established the theory of NP-completeness. Once SAT is NP-complete, every new problem can be shown NP-complete by reduction from SAT (or from any other known NP-complete problem). This created a practical methodology: to show a problem is hard, reduce SAT to it.
It also explains why SAT solvers are so broadly useful: any problem in NP can be encoded as SAT. Scheduling, planning, verification, and combinatorial optimization all reduce to SAT.
Failure Mode
NP-completeness is a worst-case notion. SAT is NP-complete in the worst case, but many practical instances have exploitable structure (locality, symmetry, small clause width) that modern solvers exploit. The gap between worst-case theory and practical performance is enormous for SAT.
SAT Solving Algorithms
DPLL (Davis-Putnam-Logemann-Loveland, 1962):
- Unit propagation: if a clause has only one unassigned literal, set it to true.
- Pure literal elimination: if a variable appears only positively (or only negatively), set it accordingly.
- Branching: pick an unassigned variable, try true, then try false. Backtrack on contradiction.
CDCL (Conflict-Driven Clause Learning, 1996): CDCL extends DPLL with two key innovations:
- Conflict analysis: when a contradiction is reached, analyze why it occurred. Identify a set of variable assignments that caused the conflict.
- Learned clauses: add a new clause to the formula that prevents the same conflict from recurring. This prunes the search space.
- Non-chronological backtracking: jump back to the decision level that caused the conflict, not just one level up.
CDCL is the basis of all competitive modern SAT solvers (MiniSat, CaDiCaL, Kissat). On industrial benchmarks, CDCL solvers routinely handle formulas with millions of variables and clauses.
SMT Architecture
Modern SMT solvers (Z3, CVC5, Yices) use the DPLL(T) architecture:
- Abstract the formula: replace theory atoms () with Boolean variables ().
- Use a SAT solver (CDCL) on the abstracted Boolean formula.
- When the SAT solver finds a satisfying assignment, pass the corresponding theory atoms to a theory solver.
- If the theory solver says the atoms are inconsistent (e.g., and simultaneously), generate a conflict clause and feed it back to the SAT solver.
- Repeat until SAT with a theory-consistent assignment, or UNSAT.
This separation of concerns keeps the architecture clean: the SAT solver handles Boolean structure, the theory solvers handle domain reasoning, and they communicate through conflict clauses.
Why Solvers Complement LLMs
LLMs and SAT/SMT solvers have complementary strengths:
| Capability | LLM | SAT/SMT solver |
|---|---|---|
| Flexibility | High: handles natural language, ambiguous specifications | Low: requires formal specification |
| Correctness guarantee | None: can hallucinate | Complete: if it says UNSAT, the formula is unsatisfiable |
| Arithmetic | Unreliable for multi-step computation | Exact |
| Search | Beam search, sampling | Exhaustive with pruning |
For AI agents, the natural architecture is: use the LLM to translate informal requirements into formal constraints, then use a solver to check satisfiability or find solutions. The LLM provides flexibility; the solver provides correctness.
Common Confusions
NP-complete does not mean unsolvable
SAT is NP-complete, but modern solvers routinely solve practical instances with millions of variables. NP-completeness is a worst-case classification. Practical instances often have structure (small treewidth, clustering, phase transitions) that solvers exploit. The average-case and structured-case complexity of SAT is much better than the worst case.
SMT is not just SAT with extra steps
SMT solvers do not simply convert everything to SAT. They use specialized decision procedures for each theory (simplex for linear arithmetic, congruence closure for uninterpreted functions, etc.). The DPLL(T) architecture interleaves SAT solving with theory reasoning, which is more efficient than a pure SAT encoding for most theory-heavy problems.
Solvers are not obsolete because of LLMs
LLMs cannot replace SAT/SMT solvers. LLMs have no mechanism for guaranteed correct reasoning over formal constraints. A solver that returns UNSAT has proved unsatisfiability by exhaustive search. No LLM can provide this guarantee. The two tools serve different roles and are most powerful when combined.
Exercises
Problem
Convert the following Boolean formula into CNF and determine if it is satisfiable: .
Problem
Explain why conflict-driven clause learning (CDCL) is more efficient than naive DPLL on structured problems. What information does a learned clause encode?
Problem
Sketch how you would use an SMT solver to verify that a ReLU neural network with one hidden layer always outputs a non-negative value on inputs in .
References
Canonical:
- Cook, "The Complexity of Theorem-Proving Procedures" (1971)
- Biere, Heule, van Maaren, Walsh, Handbook of Satisfiability (2nd ed., 2021)
Solvers:
- de Moura & Bjorner, "Z3: An Efficient SMT Solver" (2008)
- Marques-Silva, Lynce, Malik, "Conflict-Driven Clause Learning SAT Solvers" (2009)
Neural network verification:
- Katz et al., "Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks" (2017)
Next Topics
- Model theory basics: the semantic side of logic (truth in structures vs. provability)
- Proof theory and cut-elimination: the structural theory behind proof search
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- P vs NPLayer 0A
- Sorting AlgorithmsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A