SAT, SMT, and Automated Reasoning

Why This Matters

SAT (Boolean satisfiability) is the canonical NP-complete problem: given a Boolean formula, is there an assignment of variables that makes it true? Despite worst-case exponential complexity, modern SAT solvers handle formulas with millions of variables in practice. SMT (Satisfiability Modulo Theories) extends SAT with domain-specific reasoning about integers, reals, arrays, and bitvectors.

These solvers matter for AI and ML in two ways. First, they are the backbone of formal verification: checking that a neural network satisfies safety properties, verifying that a program meets its specification, or proving that a hardware design is correct. Second, as AI agents become more capable, SAT/SMT solvers serve as reliable "reasoning backends" that can verify intermediate claims, check consistency of plans, and discharge arithmetic obligations. Solvers and LLMs are complementary: LLMs generate plausible reasoning, solvers verify it.

Prerequisites

This page assumes familiarity with P vs NP and the concept of NP-completeness.

Core Definitions

Definition

SAT (Boolean Satisfiability Problem)

Given a Boolean formula $\phi$ over variables $x_1, \ldots, x_n$ (using $\land$ , $\lor$ , $\neg$ ), the SAT problem asks: does there exist an assignment $x_1 = b_1, \ldots, x_n = b_n$ (with $b_i \in \{\text{true}, \text{false}\}$ ) such that $\phi$ evaluates to true?

A formula is satisfiable if and only if such an assignment exists, and unsatisfiable otherwise.

Definition

Conjunctive Normal Form (CNF)

A formula is in CNF if and only if it is a conjunction of clauses, where each clause is a disjunction of literals (variables or their negations):

$\phi = (l_{11} \lor \cdots \lor l_{1k_1}) \land \cdots \land (l_{m1} \lor \cdots \lor l_{mk_m})$

Every Boolean formula can be converted to an equisatisfiable CNF formula (with possible introduction of auxiliary variables). SAT solvers typically operate on CNF.

Tseitin Transformation

Naive distribution of $\land$ over $\lor$ can produce a CNF formula whose size is exponential in the size of the original formula. The Tseitin transformation (Tseitin 1968) avoids this blowup: introduce a fresh auxiliary variable $y_\psi$ for every subformula $\psi$ , add clauses asserting $y_\psi \leftrightarrow \psi$ expressed locally in terms of the children's auxiliary variables, and conjoin $y_\phi$ for the root. The resulting CNF has size linear in the original formula. The transformation preserves satisfiability, not logical equivalence: the new formula has extra variables and does not describe the same set of models over the original alphabet.

Definition

SMT (Satisfiability Modulo Theories)

An SMT problem asks whether a first-order formula is satisfiable with respect to a background theory $T$ . Common theories include:

Linear integer arithmetic (LIA): $2x + 3y \leq 7$
Linear real arithmetic (LRA): $x/2 + y > 0$
Arrays: read/write operations with $\text{select}(a, i)$ and $\text{store}(a, i, v)$
Bitvectors: fixed-width binary arithmetic (32-bit integers, overflow)
Uninterpreted functions: $a = b \implies f(a) = f(b)$ (congruence; the converse is injectivity and is not assumed)

SMT solvers combine SAT solving with theory-specific decision procedures.

Main Theorems

Theorem

Cook-Levin Theorem

Statement

SAT is NP-complete. That is:

SAT is in NP: given a satisfying assignment, it can be verified in polynomial time.
Every problem in NP can be reduced to SAT in polynomial time.

Intuition

The computation of any polynomial-time verifier can be encoded as a Boolean formula. Each step of the computation, each cell of the tape, each state of the machine becomes a Boolean variable. The formula asserts "the machine accepts," and its satisfiability is equivalent to the original NP problem having a solution.

Proof Sketch

Let $L \in \text{NP}$ with verifier $V$ running in time $p(n)$ . Encode $V$ 's computation as a Boolean formula:

Create variables $s_{t,q}$ for "machine is in state $q$ at time $t$ ."
Create variables $c_{t,i,a}$ for "tape cell $i$ contains symbol $a$ at time $t$ ."
Create variables $h_{t,i}$ for "head is at position $i$ at time $t$ ."
Add clauses encoding: (a) the initial configuration matches the input, (b) each step follows the transition function, (c) the machine is in an accepting state at the final step.

The formula has $O(p(n)^2)$ variables and clauses (polynomial in $n$ ), and it is satisfiable if and only if $V$ accepts some certificate. The satisfying assignment encodes the certificate.

Why It Matters

Cook-Levin established the theory of NP-completeness. Once SAT is NP-complete, every new problem can be shown NP-complete by reduction from SAT (or from any other known NP-complete problem). This created a practical methodology: to show a problem is hard, reduce SAT to it.

It also explains why SAT solvers are so broadly useful: any problem in NP can be encoded as SAT. Scheduling, planning, verification, and combinatorial optimization all reduce to SAT.

Failure Mode

NP-completeness is a worst-case notion. SAT is NP-complete in the worst case, but many practical instances have exploitable structure (locality, symmetry, small clause width) that modern solvers exploit. The gap between worst-case theory and practical performance is enormous for SAT.

report a correction →

SAT Solving Algorithms

DPLL (Davis-Putnam-Logemann-Loveland, 1962):

Unit propagation: if a clause has only one unassigned literal, set it to true.
Pure literal elimination: if a variable appears only positively (or only negatively), set it accordingly.
Branching: pick an unassigned variable, try true, then try false. Backtrack on contradiction.

CDCL (Conflict-Driven Clause Learning, 1996): CDCL extends DPLL with two key innovations:

Conflict analysis: when a contradiction is reached, analyze why it occurred. Identify a set of variable assignments that caused the conflict.
Learned clauses: add a new clause to the formula that prevents the same conflict from recurring. This prunes the search space.
Non-chronological backtracking: jump back to the decision level that caused the conflict, not just one level up.

CDCL is the basis of all competitive modern SAT solvers (MiniSat, CaDiCaL, Kissat). On industrial benchmarks, CDCL solvers routinely handle formulas with millions of variables and clauses.

SMT Architecture

Modern SMT solvers (Z3, CVC5, Yices) use the DPLL(T) architecture:

Abstract the formula: replace theory atoms ( $x + y > 3$ ) with Boolean variables ( $b_1$ ).
Use a SAT solver (CDCL) on the abstracted Boolean formula.
When the SAT solver finds a satisfying assignment, pass the corresponding theory atoms to a theory solver.
If the theory solver says the atoms are inconsistent (e.g., $x > 3$ and $x < 2$ simultaneously), generate a conflict clause and feed it back to the SAT solver.
Repeat until SAT with a theory-consistent assignment, or UNSAT.

This separation of concerns keeps the architecture clean: the SAT solver handles Boolean structure, the theory solvers handle domain reasoning, and they communicate through conflict clauses.

Nelson-Oppen Theory Combination

Real SMT queries mix theories: a verification condition might touch linear arithmetic, arrays, and uninterpreted functions in the same formula. The Nelson-Oppen method (Nelson and Oppen 1979) combines decision procedures for disjoint, stably-infinite theories $T_1, \ldots, T_k$ into a decision procedure for their union. The input is first purified: each atom is rewritten so it uses symbols from a single theory, with shared variables at the interfaces. The individual theory solvers then exchange equalities between shared variables; a combined model exists if and only if each theory agrees on the induced partition of the shared variables. Modern DPLL(T) frameworks (Nieuwenhuis, Oliveras, and Tinelli 2006) integrate Nelson-Oppen with lazy SMT, so theory propagation and conflict analysis span multiple theories inside the CDCL loop.

Phase Transitions in Random $k$ -SAT

Random $k$ -SAT instances are generated by sampling $m$ clauses uniformly at random over $n$ variables, each clause a disjunction of $k$ literals. Empirically, the probability that a random 3-SAT formula is satisfiable undergoes a sharp transition as the clause-to-variable ratio $\alpha = m/n$ crosses a critical value near $\alpha_c \approx 4.267$ . For $\alpha \ll \alpha_c$ almost all instances are satisfiable and easy; for $\alpha \gg \alpha_c$ almost all are unsatisfiable and easy to refute. Instances near $\alpha_c$ are the hardest: solver runtime peaks sharply around the threshold. Friedgut (1999) proved that a sharp satisfiability threshold exists for random $k$ -SAT with $k \geq 3$ , but the exact location of the 3-SAT threshold remains an open problem; tight upper and lower bounds have been established and the sequence of physics-inspired and rigorous bounds has steadily narrowed the window. The phase-transition picture guides benchmark generation and explains why average-case complexity is not a useful summary of SAT difficulty.

Why Solvers Complement LLMs

LLMs and SAT/SMT solvers have complementary strengths:

Capability	LLM	SAT/SMT solver
Flexibility	High: handles natural language, ambiguous specifications	Low: requires formal specification
Correctness guarantee	None: can hallucinate	Complete: if it says UNSAT, the formula is unsatisfiable
Arithmetic	Unreliable for multi-step computation	Exact
Search	Beam search, sampling	Exhaustive with pruning

For AI agents, the natural architecture is: use the LLM to translate informal requirements into formal constraints, then use a solver to check satisfiability or find solutions. The LLM provides flexibility; the solver provides correctness.

Common Confusions

Watch Out

NP-complete does not mean unsolvable

SAT is NP-complete, but modern solvers routinely solve practical instances with millions of variables. NP-completeness is a worst-case classification. Practical instances often have structure (small treewidth, clustering, phase transitions) that solvers exploit. The average-case and structured-case complexity of SAT is much better than the worst case.

Watch Out

SMT is not just SAT with extra steps

SMT solvers do not simply convert everything to SAT. They use specialized decision procedures for each theory (simplex for linear arithmetic, congruence closure for uninterpreted functions, etc.). The DPLL(T) architecture interleaves SAT solving with theory reasoning, which is more efficient than a pure SAT encoding for most theory-heavy problems.

Watch Out

Solvers are not obsolete because of LLMs

LLMs cannot replace SAT/SMT solvers. LLMs have no mechanism for guaranteed correct reasoning over formal constraints. A solver that returns UNSAT has proved unsatisfiability by exhaustive search. No LLM can provide this guarantee. The two tools serve different roles and are most powerful when combined.

Exercises

ExerciseCore

Problem

Convert the following Boolean formula into CNF and determine if it is satisfiable: $(x_1 \lor x_2) \land (\neg x_1 \lor x_3) \land (\neg x_2 \lor \neg x_3) \land (x_1 \lor \neg x_2)$ .

ExerciseAdvanced

Problem

Explain why conflict-driven clause learning (CDCL) is more efficient than naive DPLL on structured problems. What information does a learned clause encode?

ExerciseAdvanced

Problem

Sketch how you would use an SMT solver to verify that a ReLU neural network with one hidden layer always outputs a non-negative value on inputs in $[0, 1]^d$ .

Neural Network Verification. State of the Art

Reluplex (Katz, Barrett, Dill, Julian, and Kochenderfer 2017, arXiv:1702.01135) introduced a Simplex-based decision procedure for piecewise-linear ReLU constraints and remains the foundational instance of SMT-style NN verification. It was superseded by Marabou (Katz, Huang, Ibeling, Julian, Lazarus, Lim, Shah, Thakoor, Wu, Zeljić, Dill, Kochenderfer, and Barrett 2019, arXiv:1910.14322), which extends Reluplex with support for arbitrary piecewise-linear activations, richer property specifications, and parallel search. The current state of the art on standard robustness benchmarks is $\alpha,\beta$ -CROWN (Xu, Zhang, Wang, Wang, Jana, Lin, and Hsieh 2021, arXiv:2103.06624, with ongoing extensions through 2024), which combines branch-and-bound with the CROWN family of linear-relaxation bound-propagation techniques and has won successive iterations of the VNN-COMP competition. The practical takeaway is that exact SMT-style verification scales to small networks and local robustness queries, while branch-and-bound with tight linear relaxations now dominates on realistic vision models.

Further directions

VSIDS variable-ordering heuristic in CDCL
Proof production and DRAT unsat certificates
Parallel SAT solving: portfolio solvers and cube-and-conquer
QBF (PSPACE-complete) and beyond
#SAT model counting (harder than decision, ApproxMC)
Craig interpolation from UNSAT proofs
Abstract interpretation connection for program verification
Quiz

References

Canonical:

Cook, "The Complexity of Theorem-Proving Procedures" (1971)
Biere, Heule, van Maaren, Walsh, Handbook of Satisfiability (2nd ed., 2021)
Tseitin, "On the Complexity of Derivation in Propositional Calculus" (1968)

Solvers:

de Moura & Bjorner, "Z3: An Efficient SMT Solver" (2008)
Marques-Silva, Lynce, Malik, "Conflict-Driven Clause Learning SAT Solvers" (2009)

Theory combination and DPLL(T):

Nelson & Oppen, "Simplification by Cooperating Decision Procedures," TOPLAS 1(2):245-257 (1979)
Nieuwenhuis, Oliveras, Tinelli, "Solving SAT and SAT Modulo Theories," JACM 53(6):937-977 (2006)

Phase transitions:

Friedgut, "Sharp Thresholds of Graph Properties, and the $k$ -SAT Problem," JAMS 12:1017-1054 (1999)

Neural network verification:

Katz, Barrett, Dill, Julian, Kochenderfer, "Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks," arXiv:1702.01135 (2017)
Katz, Huang, Ibeling, Julian, Lazarus, Lim, Shah, Thakoor, Wu, Zeljić, Dill, Kochenderfer, Barrett, "The Marabou Framework for Verification and Analysis of Deep Neural Networks," arXiv:1910.14322 (2019)
Xu, Zhang, Wang, Wang, Jana, Lin, Hsieh, "Fast and Complete: Enabling Complete Neural Network Verification with Rapid and Massively Parallel Incomplete Verifiers," arXiv:2103.06624 (2021)

Next Topics

Model theory basics: the semantic side of logic (truth in structures vs. provability)
Proof theory and cut-elimination: the structural theory behind proof search

Why This Matters

Prerequisites

Core Definitions

Tseitin Transformation

Main Theorems

SAT Solving Algorithms

SMT Architecture

Nelson-Oppen Theory Combination

Phase Transitions in Random kkk-SAT

Why Solvers Complement LLMs

Common Confusions

Exercises

Neural Network Verification. State of the Art

Further directions

References

Next Topics

Phase Transitions in Random $k$ -SAT