Distributed Systems and Scalability Feed #1 – Heroku Client Rate Throttling, Tail Latency and more
What does 100 million users on a Google service mean? 10 billion requests/day100k requests/second (average)200k requests/second (peak)2 million disk seeks per second (IOPS) How are these served?100 million users need (at peak) 2 million IOPS at 100 IOPS per disk, that’s 20k disk drives at…
Zero to Software/Application Architect – Learning Track
Zero to Software/Application Architect learning track is a series of four courses that I am writing (2 courses published) with an aim to educate you, step by step, on the domain of software architecture & distributed system design. The learning track takes you right from…
Java Full Stack Developer – The Complete Roadmap – Part 2 – Let’s Talk
This write-up is part 2 of the discussion on how to become a full-stack Java developer. If you haven’t read the first part. Here you go – Time to talk about the database component… The resources stated in this article contain affiliate links. That…
Java Full Stack Developer – The Complete Roadmap – Part 1 – Let’s Talk
Hello readers, This write-up takes a deep dive into full-stack Java development. We will have an insight into what it is? What are the job requirements of a full-stack Java developer? How do I become one? What should be the salary expectations of the devs…
Best (Handpicked) Resources To Learn Software Architecture, Distributed Systems and System Design
In this article, I’ve put together a list of resources (online courses + books) that I believe are super helpful in building a solid foundation in software architecture and designing large-scale distributed systems like Facebook, YouTube, Gmail, Uber & so on. I’ll start with…
Distributed Systems and Scalability Feed
Facebook photo storage architecture
Facebook built Haystack, an object storage system designed for storing photos on a large scale. The platform stores over 260 billion images which amounts to over 20 petabytes of data. One billion new photos are uploaded each week which is approx—60 terabytes of data. At peak, the platform serves over one million images per second.
In the original NAS-based photo storage architecture, Facebook faced throughput and latency issues as the photos and the associated metadata lookups in NAS caused excessive disk operations almost upto ten just for retrieving a single image.

Tail latency in distributed systems
Tail latency is that tiny percentage of responses from a system that are the slowest in comparison to most of the responses. They are often called as the 98th or 99th percentile response times. This may seem insignificant at first but for large applications like LinkedIn, this has noticeable effects. This could mean that for a page having a million views per day 10,000 of those page views would experience the delay. Read how LinkedIn deals with longtail network latencies.
There can be multiple causes of tail latency: increasing load on the system, complex and distributed systems, application bottlenecks, slow network, slow disk access and more. Read more on it.
RobinHood: Tail latency-aware caching
RobinHood is a research caching system for application servers in large distributed systems having diverse backends. The cache system dynamically partitions the cache space between different backend services and continuously optimizes the partition sizes.
Microsoft research has a talk on getting rid of long-tail latencies.
> Spotify Engineering: From Live to Recording
> Ingesting LIVE video streams at a global scale at Twitch
> $64,944 spent on AWS, to support 25,000 customers, in August by ConvertKit.
> Read how Storytel engineering computes customer consumption of books transitioning from batch processing to streaming bookmarks data with Apache Beam and Google Cloud.
> How Pokemon Go scales to millions of requests per second?
> Insight into how Grab built a high-performance ad server.
SUBSCRIBE TO MY NEWSLETTER to be notified of new additions to the list. Fortnight/monthly emails.
Looking for developer, software architect jobs? Try Jooble. Jooble is a job search engine created for a single purpose: To help you find the job of your dreams!!
Recent Posts
- Web Application Architecture Explained With Designing a Real-World Service
- Wide-column, Column-oriented and Column Family Databases – A Deep Dive with Bigtable and Cassandra
- Design For Scale and High Availability – What Does 100 Million Users On A Google Service Mean?
- How Razorpay handled significant transaction bursts during events like IPL
- Facebook’s Photo Storage Architecture
Follow On Social Media