Hello, and welcome back to The Coder Cafe! Today, let’s explore one of the most fundamental concepts in the area of distributed systems: the CAP theorem.
The CAP theorem is part of the fundamental concepts that help us understand the trade-offs we should make when we design distributed systems or select which database to use in our project.
First, let’s define the three properties related to the CAP theorem:
Consistency: Every read request receives either the most recent write or an error. In the context of the CAP theorem, consistency refers to a different concept than in what we discussed yesterday in ACID; here it specifically refers to linearizability1.
NOTE: In a few weeks, we will explore the theme of consistency models. During that week, we will cover what linearizability means. For now, let’s say that linearizability is a very strong consistency model that essentially provides the illusion of a single-node database.
Availability: Every request receives a non-error response, even if it may not contain the most up-to-date data.
Partition tolerance: The system continues to operate correctly even if communication between some nodes is lost due to network issues (e.g., dropped or delayed messages2).
The CAP theorem states that among consistency (C), availability (A), and partition tolerance (P), we can only pick any two. Said differently, a system can only be:
CP (Consistent and Partition-tolerant): The system ensures that all reads return the most recent write, even during network partitions. However, it sacrifices availability because some requests may fail during a partition.
AP (Available and Partition-tolerant): The system remains operational and available during network partitions, but it may return outdated data, breaking consistency.
CA (Consistent and available): The system guarantees both consistency and availability. In practice, partition tolerance is essential because network instability is inevitable in distributed systems. Therefore, CA systems are not viable3.
If we phrase the theorem differently:
If there’s no network partition, a system can be both consistent and available (CA).
If there are network partitions, a system must choose between consistency (CP) or availability (AP).
Here’s why: during network partition, meaning some node(s) may no longer be able to communicate with other(s). In such a case, the system has two choices:
Prioritize consistency: If the system chooses to prioritize consistency, nodes that are partitioned from the rest of the systems cannot respond to requests until the partition is over. This delay means the system is not available during the partition.
Prioritize availability: If the system chooses to prioritize availability, all nodes must respond to requests even during partitions. Yet, because some nodes can’t communicate with others, they may return outdated data, breaking consistency.
The CAP theorem forces us to confront the reality that no distributed systems can fully achieve consistency, availability, and partition tolerance at the same time. Understanding these trade-offs is important for making informed decisions when designing distributed systems.
The CAP theorem has faced criticism for several reasons, one being that the terms C, A, and P are confusing and ambiguous. Tomorrow, we will explore an alternative to the CAP theorem, but in a future issue, we will explore in more detail some of these criticisms and delve into alternatives that give better reasoning about most systems and their trade-offs.
So, consistency has a different meaning in ACID, eventual consistency, and the CAP theorem. If you have ever been confused by the very meaning of consistency, you’re not alone 😉.
Note that we only consider network failures as part of this definition. For example, the failure of a node isn’t considered as a partition in the context of the CAP theorem.
Or this is something I’m not aware of.
I didn't know that node failures were not considered as a partition in the context of CAP...