Systems
CAP Theorem
In a distributed system, you cannot simultaneously guarantee consistency, availability, and partition tolerance. Brewer's conjecture, the Gilbert-Lynch proof, PACELC, and practical implications for ML infrastructure.
Prerequisites
Why This Matters
Any distributed system that stores state (a database, a parameter server, a feature store, a model registry) must choose how to behave when network partitions occur. The CAP theorem proves that no system can simultaneously provide strong consistency, high availability, and partition tolerance. This is not a design limitation; it is a mathematical impossibility result.
For ML practitioners, CAP matters because modern ML infrastructure is distributed. Training clusters use parameter servers or all-reduce protocols. Feature stores serve real-time features across data centers. Model serving systems replicate models for low latency. In each case, the system designer must decide: when a network partition occurs, do you serve stale data (sacrifice consistency) or refuse to serve (sacrifice availability)?
Formal Definitions
The CAP theorem requires precise definitions of its three properties. Informal statements ("pick two of three") obscure the actual result.
Consistency (Linearizability)
A distributed system provides consistency (in the CAP sense) if it is linearizable: every read operation returns the value of the most recent completed write, as if all operations were executed on a single copy of the data in some sequential order consistent with real-time ordering.
Formally: there exists a total order on all operations such that (1) the order is consistent with real-time precedence (if operation completes before operation starts, then precedes in the order), and (2) every read of object returns the value written by the most recent preceding write to in this order.
This is the strongest standard consistency model. It is strictly stronger than sequential consistency, causal consistency, or eventual consistency.
Availability
A distributed system provides availability if every request received by a non-failing node eventually receives a response, without any guarantee about the response time bound.
In the Gilbert-Lynch formalization: every request to a non-failing node must result in a response. A system that responds with an error message or timeout is not available under this definition. A system that queues requests indefinitely is not available either.
Partition Tolerance
A distributed system provides partition tolerance if it continues to operate correctly (satisfying its consistency and availability guarantees) despite arbitrary message loss or delay between nodes.
A network partition divides the nodes into two or more groups that cannot communicate with each other. Partition tolerance means the system must handle this scenario; it cannot assume the network is reliable.
Partition tolerance is not optional in real systems
In a distributed system, network partitions will occur. Hardware fails, switches malfunction, cables get cut, cloud availability zones lose connectivity. A system that does not tolerate partitions is a system that runs on a single node (or assumes a perfectly reliable network). The practical version of CAP is not "pick 2 of 3" but rather "given that P is required, choose between C and A during partitions."
The Impossibility Result
CAP Impossibility Theorem
Statement
In an asynchronous distributed system, it is impossible to implement a read/write data object that simultaneously guarantees all three of the following properties:
- Consistency (linearizability)
- Availability (every request to a non-failing node receives a response)
- Partition tolerance (the system operates correctly despite arbitrary network partitions)
Any distributed system providing partition tolerance must sacrifice either consistency or availability during a network partition.
Intuition
Suppose two nodes and each hold a copy of variable , initially . A network partition separates them. A client writes to . Another client reads from .
Node cannot communicate with (partition), so it does not know about the write. It has two choices:
- Return the stale value . This violates consistency (the read should return ).
- Refuse to respond until the partition heals. This violates availability.
There is no third option. The partition makes it impossible to both return the correct value and return any value at all.
Proof Sketch
(Following Gilbert and Lynch, 2002.)
Assume for contradiction that a system provides all three properties: consistency (linearizability), availability, and partition tolerance.
Consider two nodes and connected by a network that may partition. Initialize the shared variable at both nodes.
Step 1. Partition the network so that and cannot communicate.
Step 2. Client sends a write request to . By availability, must acknowledge the write. Since the partition prevents communication, does not learn about this write.
Step 3. Client sends a read request for to . By availability, must respond. The only value knows is (the partition prevents it from learning ). So returns .
Step 4. The write of completed before the read started (in real time). By linearizability, the read must return . But returned . Contradiction.
Therefore, no system can provide all three properties simultaneously.
For the partially synchronous model (bounded message delay), Gilbert and Lynch show a similar result: if the partition lasts longer than the message delay bound, the same impossibility holds.
Why It Matters
The CAP theorem is one of the few impossibility results in distributed systems with direct engineering consequences. It forces every distributed storage system to make an explicit choice:
-
CP systems (consistency + partition tolerance): during a partition, some requests are rejected. Examples: ZooKeeper, etcd, HBase in strict mode. These systems are suitable when correctness matters more than uptime (configuration stores, lock services, leader election).
-
AP systems (availability + partition tolerance): during a partition, all nodes respond but may return stale data. Examples: Cassandra, DynamoDB (default settings), CouchDB. These systems are suitable when uptime matters more than perfect consistency (shopping carts, social media feeds, caching layers).
-
CA systems (consistency + availability, no partition tolerance): only possible on a single node or with a perfectly reliable network. Traditional single-node relational databases are CA. In practice, any system deployed across a network must handle partitions, so CA is not achievable in a distributed setting.
Failure Mode
The theorem says nothing about what happens when there is no partition. During normal operation (no partition), a system can provide both consistency and availability. The trade-off only activates during partitions. Many practitioners incorrectly interpret CAP as "you can only ever have two properties," when the correct reading is "during a partition, you must choose between C and A."
The theorem also uses the strongest form of consistency (linearizability). Weaker consistency models (causal consistency, eventual consistency) are achievable alongside availability and partition tolerance. The theorem does not say all consistency is impossible under partitions; it says the strongest form is.
The PACELC Extension
CAP only describes behavior during partitions. Abadi (2012) proposed PACELC to also characterize the trade-off during normal operation.
PACELC
The PACELC framework extends CAP:
- If there is a Partition, the system trades off between Availability and Consistency (this is the CAP trade-off).
- Else (during normal operation, no partition), the system trades off between Latency and Consistency.
Even without partitions, enforcing strong consistency requires coordination between replicas (synchronous replication, consensus protocols), which adds latency. A system can serve reads from local replicas with low latency but risk staleness, or coordinate with the primary to ensure freshness at the cost of higher latency.
PACELC classifications of common systems:
| System | During Partition | Normal Operation |
|---|---|---|
| ZooKeeper | PC (refuse some requests) | EC (pay latency for consistency) |
| Cassandra | PA (serve stale data) | EL (low latency, eventual consistency) |
| DynamoDB (default) | PA | EL |
| Spanner | PC | EC (uses TrueTime for consistency) |
| PostgreSQL (single-node) | CA (no partition tolerance) | EC |
Practical Trade-offs
Eventual Consistency
Most AP systems provide eventual consistency: if no new writes occur, all replicas will eventually converge to the same value. This is formalized as:
Eventual Consistency
A replicated data store is eventually consistent if, whenever no new updates are made to a given data item, all replicas of that item will eventually return the same value. There is no bound on how long convergence takes.
Formally: for every execution, if there exists a time after which no writes occur, then there exists a time after which all reads return the same value.
Eventual consistency is weak. It says nothing about what happens during the convergence period. Reads may return arbitrarily old values, different replicas may return different values simultaneously, and there is no bound on convergence time. Stronger models (causal consistency, read-your-writes, monotonic reads) provide better guarantees while remaining achievable under partitions.
Consensus Protocols
CP systems typically use distributed consensus protocols (Paxos, Raft, ZAB) to maintain consistency across replicas. These protocols guarantee that a majority of nodes agree on every write before it is committed. During a partition, the minority partition cannot form a majority and therefore cannot serve writes (sacrificing availability for the minority side).
The cost: every write requires communication with a majority of replicas. For a system with replicas tolerating failures, each write requires acknowledgments. This adds latency proportional to the network round-trip time to the slowest responding majority node.
Relevance to ML Infrastructure
Distributed training. AllReduce-based training requires all workers to synchronize gradients at each step. This is a CP-style design: if any worker becomes unreachable (partition), training stalls (sacrifices availability). Asynchronous SGD is an AP-style design: workers use stale gradients (sacrifices consistency) but training continues. The Hogwild! result shows that for sparse problems, asynchronous updates with stale gradients still converge, but the convergence rate degrades with staleness.
Feature stores. A feature store serving real-time features for model inference faces CAP directly. If the feature store is partitioned from its source of truth, it can serve cached (stale) features (AP) or return errors (CP). Most production feature stores choose AP with bounded staleness: serve cached features if the cache is less than seconds old, fail otherwise.
Model serving. Replicated model servers are typically AP: each replica serves its local copy of the model. During a model update, some replicas may briefly serve the old version while others serve the new version. This is usually acceptable because model updates are infrequent and the old model is not "wrong," just slightly outdated.
Parameter servers. The original parameter server architecture (Li et al., 2014) supports both synchronous (CP-like) and asynchronous (AP-like) updates. The choice depends on the model: large sparse models (recommendation systems) tolerate asynchronous updates well; dense models (language models) are more sensitive to staleness.
Common Confusions
CAP does not mean pick any two and get a working system
The "pick 2 of 3" framing suggests three equally viable combinations. In practice, partition tolerance is required for any distributed system (partitions happen). The real choice is between CP and AP during partitions. A "CA" system is a single-node system or one that crashes on partition, which is not useful for distributed deployments.
CAP says nothing about performance
The theorem is about possibility, not performance. A CP system might have terrible latency even without partitions (due to consensus overhead). An AP system might have poor consistency even without partitions (due to replication lag). CAP describes the fundamental trade-off boundary, not the engineering quality of a specific system. PACELC addresses this gap.
Eventual consistency is not the same as no consistency
An eventually consistent system guarantees convergence in the absence of new writes. This is much stronger than "anything goes." Conflict resolution mechanisms (last-writer-wins, vector clocks, CRDTs) ensure that convergence happens deterministically. The weakness is that there is no time bound on convergence and no guarantee about intermediate states.
Key Takeaways
- CAP theorem: no distributed system can provide linearizability, availability, and partition tolerance simultaneously
- The proof is by contradiction: during a partition, a node must either return stale data (violating C) or not respond (violating A)
- In practice, partition tolerance is required, so the choice is CP or AP during partitions
- PACELC extends CAP to also characterize latency vs. consistency trade-offs during normal operation
- Eventual consistency guarantees convergence but provides no time bound or intermediate-state guarantees
- CP systems (ZooKeeper, etcd) use consensus protocols; AP systems (Cassandra, DynamoDB) use eventual consistency
- ML infrastructure (parameter servers, feature stores, model serving) faces these trade-offs directly
Exercises
Problem
A distributed key-value store has three nodes: , , . A network partition separates from . The system uses majority quorum for writes (a write succeeds if acknowledged by at least 2 of 3 nodes).
(a) Can a client connected to successfully write a value? Why or why not? (b) Can a client connected to successfully write a value? Why or why not? (c) Is this system CP or AP? Explain.
Problem
Classify each of the following systems as CP, AP, or CA under the PACELC framework, and state the normal-operation trade-off (EL or EC):
(a) A single PostgreSQL server with no replication. (b) A Cassandra cluster with replication factor 3 and consistency level ONE (reads and writes require only 1 replica). (c) A ZooKeeper ensemble with 5 nodes.
Problem
The CAP theorem assumes the strongest consistency model (linearizability). Show that if we weaken consistency to causal consistency, the impossibility disappears: it is possible to build a system that is causally consistent, available, and partition-tolerant.
Describe the key idea of how such a system works (you do not need a full protocol, but describe the mechanism that avoids the CAP impossibility argument).
References
Canonical:
- Brewer, "Towards Robust Distributed Systems" (PODC 2000 keynote), the original CAP conjecture
- Gilbert & Lynch, "Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services" (2002), the formal proof, Sections 2-4
- Gilbert & Lynch, "Perspectives on the CAP Theorem" (2012), clarifications and extensions
Extensions and Corrections:
- Abadi, "Consistency Tradeoffs in Modern Distributed Database System Design" (2012), the PACELC framework
- Brewer, "CAP Twelve Years Later: How the 'Rules' Have Changed" (2012), Brewer's own retrospective and corrections to common misinterpretations
Distributed Systems Context:
- Kleppmann, Designing Data-Intensive Applications (2017), Chapter 9, consistency and consensus with practical examples
- Lynch, Distributed Algorithms (1996), Chapters 13, 17, formal models of distributed computing and impossibility results
Next Topics
- Basic logic and proof techniques: the proof-by-contradiction pattern used in the CAP impossibility argument
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Basic Logic and Proof TechniquesLayer 0A