Hello! Today, let’s discuss a concrete system design problem and the use of a randomized algorithm.
Imagine the following scenario. You’re a software engineer working on a new social media platform, and you are responsible for implementing the new Like feature for user posts.
The company uses a relational database—let’s say PostgreSQL. So what you do is create a new Likes
table that includes a foreign key to the post ID and a counter for likes:
CREATE TABLE Likes (
post_id BIGINT PRIMARY KEY,
counter INT DEFAULT 0,
-- Foreign key constraint
);
When a user likes a post, it calls your new API, /like/{post_id}
, which performs the following SQL query to increment the likes count:
UPDATE Likes
SET counter = counter + 1
WHERE post_id = ?;
You deploy your service, and everything works great.
A few months pass, and the platform starts to become more and more popular. You realize the p90 latency1 of the Like API has become a significant concern. For popular posts, the latency can reach several seconds, negatively impacting user satisfaction. So, what’s the problem with this implementation?
In most relational databases, when the SQL query above is executed, the affected row is locked. This means that when one transaction is executed, other transactions that try incrementing the counter for the same post must wait until the lock is released.
This problem here is called a hot row, and it occurs when a specific row in the database is modified frequently by multiple concurrent transactions.
How would you approach the problem? Please take a moment to think about how you would design a solution for that.
We can consider multiple strategies:
Batching updates: For popular posts, instead of immediately updating the database for every like, we could maintain a local counter. Then, a background thread would periodically update the database.
Pros: Reduces database load and improves API performance.
Cons: The main drawback is the complexity of the background task. Think about how to implement it. With one large query? Then, how do we implement error handling? Also, would locking many rows be a solid way to reduce database contention? Should we implement partitioning so that each service instance is responsible for locking its own set of rows? I can guarantee that this solution is far from being simple and would require significant efforts to be truly reliable.
Buffer database: We could introduce a new write-optimized database to temporarily store likes and implement a process that periodically fetches the data to write it to our main relational database2.
Pros: Same benefits as the batching updates solution.
Cons: Increases operational complexity as it requires an additional database and introduces latency to reflect the actual like counts.
Neither of these solutions is perfect, so let’s think about another approach. First, let’s get back to the core problem, hot rows:
If a post is not popular, it won’t face the hot rows problem.
If a post is popular, it may face it.
In the latter case, be it the users or the whole system (e.g., a recommendation service consuming the likes count), they don’t really care whether the number of likes is 27524 or 27531. Also, platforms such as Twitter approximate the number of likes from a certain number:
Thanks to this condition—for popular posts, a good approximation is enough—we can implement a probabilistic solution. The idea is incredibly simple, and it requires two lines of code. In the implementation of the like endpoint, we switch from:
// /like/{post_id} handler
function like(postID string) {
increment(postID, 1) // The function that makes the SQL query
}
To:
// /like/{post_id} handler
function like(postID string) {
if randomInt(N) == 0 {
increment(postID, N)
}
}
N
is a constant value, let’s say 10, for the sake of the example. We generate a random integer from 0 to 10:
If the random value is different from 0 (probability of 90%), we do nothing.
If the random value is equal to 0 (probability of 10%), we increment the likes counter not by 1 but by 10.
This approach leverages the law of large numbers, which states that as the size of a sample increases, the result will tend to get closer to the expected probability. When we flip a coin 10 times, we might get 7 heads and 3 tails. However, if we keep flipping the coin, say 1,000,000 times, we expect the results to be 50% heads and 50% tails:
Applied to our solution, as the probability to increment by N
is 1/N
, the more likes we receive, the more accurately our counter will reflect the true number of likes over time. This approach, besides being extremely simple to implement, allows us to tackle the latency problem and reduce the database contention, as only 1/N
calls end up doing an SQL query.
A few things to consider, though:
We should keep this approach only for popular posts; otherwise, non-popular posts may have an odd number of likes. But how do we know that a post is popular? Is that based solely on the number of likes? Could we even anticipate that a post will be popular?
The value of
N
should be considered carefully. We could also adjustN
dynamically based on the current like count, increasing it as more likes are received.
This solution, known as probabilistic increment, elegantly leverages randomized algorithms to address the challenge of hot rows. By trading precision for scalability, it significantly reduces database contention and latency while maintaining an accurate approximation of like counts over time.
Randomized algorithms like this are particularly valuable in distributed systems at scale, where exact counts are often less critical than performance and system reliability.
Explore Further
Edit
30 Jan 2025: Apparently, this technique is referred to as the CVM algorithm. Here’s the white paper that discusses it.
Would you have thought about such a solution? Otherwise, how would you have approached such a problem? Let me know in the comments.
The 90th percentile of latency. For example, if the p90 latency of an endpoint is 300ms, it means that 90% of the requests have a latency less than or equal to 300ms.
This is how Docker Hub handles pull count. The pulls are incremented in Redis before being synchronized with the main database by a batch process.
Insanely beautiful
Thank you for this post! My initial thought was more oriented to UX with optimistic updates. That wouldn't have really solved the problem, but just hid it away. First time I hear about Probabilistic increments, pretty cool!