Tail latency in distributed systems

Tail latency is that tiny percentage of responses from a system that are the slowest in comparison to most of the responses. They are often called as the 98th or 99th percentile response times. This may seem insignificant at first but for large applications like LinkedIn, this has noticeable effects. This could mean that for a page having a million views per day 10,000 of those page views would experience the delay. Read how LinkedIn deals with longtail network latencies.

There can be multiple causes of tail latency: increasing load on the system, complex and distributed systems, application bottlenecks, slow network, slow disk access and more. Read more on it.

RobinHood: Tail latency-aware caching

RobinHood is a research caching system for application servers in large distributed systems having diverse backends. The cache system dynamically partitions the cache space between different backend services and continuously optimizes the partition sizes.

Microsoft research has a talk on getting rid of long-tail latencies.

Your subscription could not be saved. Please try again.
Please check your inbox to confirm the subscription.

Subscribe to my newsletter

If you like the content. Get updates on the new content published on the blog by joining my newsletter