Computability Theory: Turing Machines, Decidability, Halting Problem

Why This Matters

Before asking whether a problem is efficient, you must ask whether it is solvable at all. Computability theory draws the boundary between problems that any algorithm can solve and problems that no algorithm can solve, regardless of time or space.

This is not abstract philosophy. In ML, certain concept classes are provably not PAC-learnable for computability reasons, not sample complexity reasons. If the hypothesis class involves solving an undecidable problem during evaluation, no learner can succeed. The halting problem is the canonical example of an undecidable problem, and its proof technique (diagonalization) recurs throughout set theory, complexity theory, and learning theory.

Core Definitions

Definition

Turing Machine

A Turing machine is a 7-tuple $M = (Q, \Sigma, \Gamma, \delta, q_0, q_{\text{accept}}, q_{\text{reject}})$ where:

$Q$ is a finite set of states
$\Sigma$ is a finite input alphabet (not containing the blank symbol $\sqcup$ )
$\Gamma \supseteq \Sigma \cup \{\sqcup\}$ is the tape alphabet
$\delta: Q \times \Gamma \to Q \times \Gamma \times \{L, R\}$ is the transition function
$q_0 \in Q$ is the start state
$q_{\text{accept}}, q_{\text{reject}} \in Q$ are the accepting and rejecting states

The machine reads from an infinite tape, one cell at a time, writes a symbol, moves left or right, and transitions to a new state. It halts when it enters $q_{\text{accept}}$ or $q_{\text{reject}}$ .

Definition

Decidable Language (Recursive Set)

A language $L \subseteq \Sigma^*$ is decidable (recursive) if and only if there exists a Turing machine $M$ that halts on every input and accepts exactly the strings in $L$ . That is, for every $w \in \Sigma^*$ , $M$ halts and outputs "accept" if $w \in L$ or "reject" if $w \notin L$ .

Definition

Semi-Decidable Language (Recursively Enumerable Set)

A language $L \subseteq \Sigma^*$ is semi-decidable (recursively enumerable, or r.e.) if and only if there exists a Turing machine $M$ that accepts exactly the strings in $L$ . If $w \in L$ , then $M$ halts and accepts. If $w \notin L$ , then $M$ may reject or may run forever.

Definition

Many-One Reduction $A \leq_{m} B$

Language $A$ is many-one reducible to language $B$ , written $A \leq_m B$ , if and only if there exists a computable function $f: \Sigma^* \to \Sigma^*$ such that for all $w$ :

$w \in A \iff f(w) \in B$

If $A \leq_m B$ and $B$ is decidable, then $A$ is decidable. Equivalently, if $A$ is undecidable and $A \leq_m B$ , then $B$ is undecidable.

The Church-Turing Thesis

The Church-Turing thesis is not a theorem. It is a claim about the physical world: every function that is "effectively computable" by any mechanical procedure is computable by a Turing machine. Evidence for this thesis includes the fact that every alternative model of computation (lambda calculus, recursive functions, register machines, cellular automata) computes exactly the same class of functions.

The thesis has practical force: if you cannot solve a problem with a Turing machine, you cannot solve it with any physical device. Quantum computers do not change this. They may be faster, but they compute the same set of functions.

Universal Turing Machine

A universal Turing machine $U$ is a single Turing machine that can simulate any other Turing machine given its description. On input $\langle M, w \rangle$ (an encoding of a TM $M$ together with an input string $w$ ), $U$ produces the same output that $M$ would produce on $w$ .

Universality is what makes Turing machines a model of general-purpose computation rather than fixed-function machines. The same idea underlies every modern computer: the program is data, and a single hardware machine executes any program. Turing's 1936 construction is the conceptual ancestor of the stored-program computer.

Recursion Theorem and Self-Reference

Theorem

Kleene's Recursion Theorem

Statement

For every total computable function $f$ there exists an index $e$ such that $\phi_e = \phi_{f(e)}$ . Equivalently: every computable transformation of program descriptions has a fixed point — a program whose source code is mapped (semantically) to itself.

Intuition

A program can have access to its own source code. Quine programs (programs that print their own source) and self-replicating viruses are concrete instances. The recursion theorem is the meta-level analog of the Y combinator from lambda calculus: both express that self-reference is available "for free" in any sufficiently powerful computational model.

Why It Matters

Many undecidability proofs use the recursion theorem as a one-line shortcut. It also formalizes why programs that introspect their own code are not strictly more powerful than ordinary programs, since self-reference can already be simulated.

report a correction →

The Halting Problem

Theorem

Undecidability of the Halting Problem

Statement

The halting problem $\text{HALT} = \{\langle M, w \rangle : M \text{ is a TM that halts on input } w\}$ is undecidable. There is no Turing machine that, given an arbitrary Turing machine $M$ and input $w$ , always correctly determines whether $M$ halts on $w$ .

Intuition

If a halting decider existed, you could use it to construct a machine that does the opposite of what it is predicted to do. This self-referential contradiction is the same diagonal argument that Cantor used to prove the reals are uncountable.

Proof Sketch

Assume for contradiction that $H$ decides HALT: $H(\langle M, w \rangle)$ accepts if $M$ halts on $w$ and rejects otherwise. Construct a new machine $D$ : on input $\langle M \rangle$ , run $H(\langle M, \langle M \rangle \rangle)$ . If $H$ accepts (meaning $M$ halts on its own description), then $D$ loops forever. If $H$ rejects (meaning $M$ does not halt on its own description), then $D$ halts.

Now run $D$ on $\langle D \rangle$ . If $D$ halts on $\langle D \rangle$ , then $H$ accepts, so $D$ loops. If $D$ does not halt on $\langle D \rangle$ , then $H$ rejects, so $D$ halts. Both cases yield a contradiction. Therefore $H$ cannot exist.

Why It Matters

The halting problem is the canonical undecidable problem. Most undecidability results in computer science are proved by reducing from HALT. It also shows that general program verification is impossible: no tool can check all programs for all properties. This is why static analysis, type checking, and formal verification must all restrict the class of programs or properties they consider.

Failure Mode

The proof requires the ability of Turing machines to simulate other Turing machines (universality). In restricted computational models (finite automata, pushdown automata), the halting problem is decidable because these models always halt or can be checked for loops in bounded time.

report a correction →

Decidability Hierarchy

The three-level classification:

Class	Definition	Closure Properties	Example
Decidable	TM halts on all inputs, accepts $L$	Union, intersection, complement	$\{0^n 1^n : n \geq 0\}$
Semi-decidable (r.e.)	TM accepts $L$ , may loop on $w \notin L$	Union, intersection	HALT
Co-semi-decidable (co-r.e.)	Complement is semi-decidable	Union, intersection	$\overline{\text{HALT}}$

A language is decidable if and only if it is both semi-decidable and co-semi-decidable. This gives a useful proof technique: to show a language is decidable, show that both it and its complement are r.e.

Rice's Theorem

Theorem

Rice's Theorem

Statement

Let $P$ be any nontrivial property of r.e. languages (i.e., some TMs have the property and some do not). Then the set

$\{\langle M \rangle : L(M) \text{ has property } P\}$

is undecidable.

Intuition

You cannot decide anything nontrivial about what a program computes by inspecting it. "Does this program ever output 1?" is undecidable. "Does this program compute a total function?" is undecidable. "Does this program compute the same function as that program?" is undecidable. The only decidable properties of programs are trivial ones (satisfied by all programs or no programs) or syntactic ones (about the code text, not its behavior).

Proof Sketch

Reduce from HALT. Given the property $P$ , let $M_P$ be a machine whose language has property $P$ (exists by nontriviality). Given input $\langle M, w \rangle$ to the halting problem, construct a machine $M'$ that on input $x$ first simulates $M$ on $w$ . If $M$ halts, then $M'$ simulates $M_P$ on $x$ . If $M$ does not halt, $M'$ loops. Then $L(M')$ has property $P$ if and only if $M$ halts on $w$ . A decider for $P$ would therefore decide HALT, contradicting the halting theorem.

Why It Matters

Rice's theorem is a sweeping impossibility result. It explains why fully automatic program analysis is impossible in general: any question about program behavior (not syntax) that has both yes-instances and no-instances is undecidable. Practical tools (linters, type checkers, model checkers) work by restricting the class of programs or by being sound but incomplete (they may reject valid programs or accept invalid ones).

Failure Mode

Rice's theorem applies only to semantic properties (properties of the language $L(M)$ ). Syntactic properties of the machine description, such as "does the machine have exactly 7 states?" or "does the source code contain the string 'hello'?", are decidable because they do not require running the machine.

report a correction →

Reductions and the Undecidability Zoo

Many-one reductions establish a hierarchy of undecidable problems. If $A \leq_m B$ , then $B$ is at least as hard as $A$ . The standard technique for proving a new problem $B$ undecidable is:

Choose a known undecidable problem $A$ (often HALT).
Construct a computable function $f$ such that $w \in A \iff f(w) \in B$ .
Conclude that $B$ is undecidable.

Example

Undecidability of the emptiness problem

The language $E_{\text{TM}} = \{\langle M \rangle : L(M) = \emptyset\}$ is undecidable. This follows immediately from Rice's theorem (emptiness is a nontrivial semantic property). Alternatively, reduce from HALT directly: given $\langle M, w \rangle$ , construct $M'$ that ignores its input, simulates $M$ on $w$ , and accepts if $M$ halts. Then $L(M') = \emptyset$ iff $M$ does not halt on $w$ .

Connection to Learning Theory

Computability imposes hard limits on learnability. PAC-learnability does not require an algorithm that outputs a hypothesis consistent with the target: in realizable PAC, the learner must produce a hypothesis with generalization error at most $\varepsilon$ with probability at least $1-\delta$ (consistent ERM is one common sufficient strategy, but not the only one), and in agnostic PAC the data need not be realizable at all and the goal is to compete with the best hypothesis in the class. The computability obstruction is sharper: if evaluating membership in concepts from $\mathcal{C}$ requires solving an undecidable problem, no learner can even output a usable hypothesis from $\mathcal{C}$ , regardless of sample complexity.

More precisely: if the concept class consists of all r.e. sets, then even with infinite data, a learner cannot distinguish between a concept that accepts a given string after a very long computation and one that loops forever on that string. Sample complexity bounds become irrelevant when the hypothesis class itself is not computable.

Ben-David et al. 2019: Learnability Independent of ZFC

Ben-David, Hrubeš, Moran, Shpilka & Yehudayoff (Nature Machine Intelligence 2019, "Learnability can be undecidable") gave a far stronger result. They studied EMX (estimating the maximum), a learning task: given i.i.d. samples from an unknown distribution $P$ over a domain $X$ , output a finite set $S \subseteq X$ that, with high probability, has $P(S) \geq \max_{|S'| \leq k} P(S') - \epsilon$ for some bound $k$ .

Their main result: there exist parameter choices for which the question "is EMX learnable?" is independent of ZFC — neither provable nor disprovable in standard set theory. The proof connects EMX learnability to the existence of a monotone compression scheme and reduces this to a combinatorial principle equivalent to a statement about cardinality below the continuum, which Cohen-style forcing shows to be independent of ZFC.

Why this matters: PAC learnability had previously been characterized by VC dimension (Blumer-Ehrenfeucht-Haussler-Warmuth 1989). Ben-David et al. show that for some learning models the analogue characterization cannot exist as a ZFC theorem at all: learnability transcends what classical mathematics can decide. This is the modern recursion-theoretic / set-theoretic limit on the foundations of learning theory, in the spirit of Gödel and Cohen but applied to the central object of statistical learning.

Common Confusions

Watch Out

Computable does not mean efficient

A function is computable exactly when some Turing machine computes it, with no bound on running time. A function is efficiently computable exactly when it can be computed in polynomial time. The class of decidable problems is vastly larger than P. Integer factoring is decidable (try all factors). Whether it is in P is unknown. The P vs NP question lives entirely within the decidable world. Computability theory asks what is solvable at all; complexity theory asks what is solvable quickly.

Watch Out

Semi-decidable is not the same as half-decidable

A semi-decidable language has a TM that says "yes" when the answer is yes but may never respond when the answer is no. It does not mean the TM gets the right answer half the time. Semi-decidability is about completeness (all yes-instances are found) without soundness of rejection (no-instances may never be classified).

Watch Out

The Church-Turing thesis is not a theorem

The Church-Turing thesis cannot be proved because "effectively computable" is an informal notion. It is a claim about the physical world, supported by the equivalence of every proposed model of computation. If someone built a physical device that computed a non-Turing-computable function, the thesis would be falsified. No such device is known or believed to exist.

Exercises

ExerciseCore

Problem

Show that the language $A = \{w : w \text{ is a palindrome over } \{0,1\}\}$ is decidable by describing a Turing machine that decides it.

ExerciseCore

Problem

Prove that if $L$ is decidable, then $\overline{L}$ (the complement of $L$ ) is also decidable.

ExerciseAdvanced

Problem

Prove that the language $\text{EQ}_{\text{TM}} = \{\langle M_1, M_2 \rangle : L(M_1) = L(M_2)\}$ is undecidable using Rice's theorem or by reduction from HALT.

ExerciseAdvanced

Problem

Give an example of a language that is semi-decidable but not decidable, and prove both claims.

ExerciseResearch

Problem

The concept class $\mathcal{C}$ over $\{0,1\}^*$ consists of all decidable languages. Is $\mathcal{C}$ PAC-learnable? Explain why computability, not sample complexity, is the bottleneck.

References

Canonical:

Turing, "On Computable Numbers, with an Application to the Entscheidungsproblem" (1936): the original construction
Sipser, Introduction to the Theory of Computation (2013), Chapters 3-5
Rogers, Theory of Recursive Functions and Effective Computability (1987), Chapters 1-7

Additional:

Hopcroft, Motwani, Ullman, Introduction to Automata Theory, Languages, and Computation (2006), Chapter 9
Cutland, Computability: An Introduction to Recursive Function Theory (1980), Chapters 1-4
Soare, Turing Computability: Theory and Applications (2016): modern reference
Odifreddi, Classical Recursion Theory (1989): comprehensive
Cooper, Computability Theory (2004): modern textbook

Classical undecidability landmarks:

Matiyasevich, "Enumerable sets are Diophantine" (1970): Hilbert's 10th problem
Novikov & Boone (1955-1958): undecidability of the word problem for groups

Connection to ML:

Shalev-Shwartz and Ben-David, Understanding Machine Learning (2014), Chapter 8 (computational complexity of learning)
Ben-David, Hrubeš, Moran, Shpilka & Yehudayoff, "Learnability can be undecidable" (Nature Machine Intelligence 2019)

Next Topics

P vs NP: once you know a problem is decidable, the next question is whether it is efficiently decidable
Kolmogorov complexity: an alternative lens on computability that measures the information content of individual strings