Adaptive LIFO

A Strategy for Graceful Degradation in High Traffic

Nov 06, 2024

Hello! Let’s keep delving into the concept of graceful degradation with a relatively unknown approach: adaptive LIFO.

Yesterday, we introduced the concept of graceful degradation: when a service encounters unexpected events, it’s often better to give some response than no response at all. We also introduced the most common approach to handle graceful degradation with load shedding. Yet, this is only one possible method, and today, we will explore another one brought by Facebook: adaptive LIFO (Last-In, First-Out).

Most services consume requests or messages in FIFO (First-In, First-Out) order. For example, when a service processes HTTP requests, it typically handles them in the order they arrive. We often assume these requests are unbounded, meaning the system can queue an infinite number of requests as long as the service can handle them eventually. However, this isn’t the case.

The number of incoming HTTP requests can be limited at various layers, including the operating system (OS). For example, in Linux, the SOMAXCONN constant defines the maximum number of pending TCP connections at the operating system level. If the connection queue reaches this limit, the OS starts rejecting additional connections. So, we can think of incoming HTTP requests as a queue.

This queue can be empty:

Or it can contain pending requests (not yet processed by our service)—here, the queue is full at 25%:

Again, requests are processed as FIFO, meaning they arrive in the order they are processed. Now, imagine that our service faces an unexpected load and the request queue is almost full. Consider the latest request:

In this situation, the latest request in the queue will sit there for a while before it is processed. By the time our service gets to it, the user may have already waited for too long, leading to the cancellation of the request. This can lead to a vicious cycle: as more requests pile up in an overloaded queue, the percentage of changes that users give up (or, even worse, retry) their request increases.

Adaptive LIFO offers a solution to this problem:

Under normal conditions: Requests are handled in FIFO order.
Under heavy load: Requests are handled in LIFO order.

The idea behind adaptive LIFO is the following:

Under heavy load, users who have just made a request are more likely to still be waiting for a response. By processing these fresh requests first, the system maximizes the chance of successfully completing them rather than wasting resources on old requests that may no longer be relevant. It follows the principle of graceful degradation that we described: giving some response back is better than no response back.

Of course, adaptive LIFO isn’t fair. Imagine you are waiting in a queue at Starbucks for 20 minutes, and someone who just arrives overtakes everyone and gets served before you. With adaptive LIFO, this is similar; the concept is unfair, yet it improves the rate of successfully processed requests.

Adaptive LIFO is a nice illustration of how we can be creative when it comes to handling graceful degradation. It’s important to remember that graceful degradation isn’t limited to load shedding. By thinking outside the box, we can develop systems that degrade gracefully and maintain a decent level of service under abnormal conditions.

Tomorrow, we will explore what makes a system resilient, fault-tolerant, robust, or reliable and how these characteristics differ from one another.

Adaptive LIFO

A Strategy for Graceful Degradation in High Traffic

Comments