How HotStar Scaled With 10.3 Million Concurrent Users – An Architectural Insight
Hotstar is the leading streaming media & video-on-demand service in India with a user base of approx. 200 million users.
It lets users stream popular shows in several different languages & genres over the web. But the primary, the most popular feature of the service is the streaming of live cricket matches. This is the real deal.
In the latest edition of IPL T20 Cricket tournament, the streaming platform received whooping record traffic of 10.3 Million concurrent users smashing its previous record of 8.26 Million concurrent users.
To my knowledge this is the most intense concurrent traffic load on any service, surpassing Fortnite which clocked a load of 8.3 million concurrent players in recent past.
1. Technical Insights
All the traffic is served by the EC2 instances & S3 Object store is used as the data store.
The services use a mix of on-demand & spot instances to keep the costs optimized.
Running machine learning & data analytics algorithms on spot instances helps the business cut down costs by quite an extent.
Terabytes of data in double digits is generated in any regular day and is processed by the AWS EMR Clusters.
AWS EMR is a managed Hadoop framework for processing massive amounts of data across the EC2 instances. Other popular frameworks like Apache Spark, Presto, HBase etc. can also be used with AWS EMR.
2. Infrastructure Setup for Load Testing the Platform for the Big Event
Speaking of the infrastructure setup for the load testing of the platform. It is 500+ AWS CPU instances which are C4.4X Large or C4.8X Large running at 75% utilization.
C4X instances are built for very high CPU intensive operations with low price per compute ratio. They provide high networking performance, increased storage performance at no additional cost.
C4.4X instances typically have 30 Gigs of RAM & C4.8X 60 Gigs of RAM
The entire setup has 16 TBs of RAM, 8000 CPU cores & at peak the data transfer was approx. 32Gbps.
The engineering team makes intelligent use of spot & on-demand instances to keep the costs low.
Zero to Software/Application Architect learning track is a series of four courses that I am writing with an aim to educate you, step by step, on the domain of software architecture & distributed system design. The learning track takes you right from having no knowledge in it to making you a pro in designing large scale distributed systems like YouTube, Netflix, Google Stadia & so on. Check it out.
3. Simulating Traffic
The traffic simulation had three major components
The traffic model, the simulation script & the load generation infrastructure. I’ve already talked about the infrastructure part.
3.1 Preparing the Traffic Model
The engineering team started with a very basic traffic model, hitting the API endpoints with a certain ratio of requests. Performed some initial runs.
There are typically two types of user interactions with the system; Users who have already signed up & are logging in & the users visiting the platform for the first time.
Keeping all the different user navigation scenarios in mind the team prepared a traffic model of the system.
Now the time was to introduce spikes in the model over the entire testing phase.
3.2 Writing the Simulation Script
A few tools that were used to write simulation are Gatling, Flood.io;
Gatling is an open-source load, performance testing tool for web applications built on Scala, Akka & Netty.
It was preferred over JMeter, which is a pretty popular testing tool, due to its capability of simulating more concurrent users at one point in time.
Gatling sessions were used to maintain the user state. Simulation scripts were designed for several workflows throughout the system, for instance, user login flow, streaming flow etc.
The simulation ran on an AWS distributed cluster & simulated 20 million users out of which 4 million were concurrent.
Another cloud-based testing tool Flood.io was used with Gatling for load testing.
Flood.io is a load testing platform that runs distributed performance tests with open source tools like JMeter, Gatling, Selenium etc.
All this test setup matured with the data obtained as the real cricket matches happened.
To educate yourself on software architecture from the right resources, to master the art of designing large scale distributed systems that would scale to millions of users, to understand what tech companies are really looking for in a candidate during their system design interviews. Read my blog post on master system design for your interviews or web startup.
4. Strategies to Scale
4.1 Traffic & Ladder Based Scaling
There are primarily two triggers to scale; Traffic-based & Ladder based
Traffic based scaling simply meant add new infrastructure to the pool as the number of requests being processed by the system increases.
This works with unpredictable workloads & for services which provide all the stats in detail. They expose the number of threads created; requests processed etc.
Ladder based scaling is preferred for services which do not give much detail on the processes. For tackle this, the team has pre-defined ladders per million concurrent users.
As the number adds on, new infrastructure is added to the pool. This technique works well for a predictable load.
The team has a concurrency buffer of 2 million concurrent users in place as adding new infrastructure to the pool takes around 90 seconds & the container and the application start takes around 75 seconds.
To beat this delay, the team has a pre-provisioned buffer. This buffer approach is preferred in contrast to auto-scaling. It helps the team handle unexpected traffic spikes without adding unnecessary latency to the response.
They have developed an internal app called the Infradashboard which helps them take these scaling decisions smoothly & quite ahead of time.
4.2 Intelligent Client
Protocols are in place which come into effect when the backend is overwhelmed with requests & the application client experiences increased latency in response.
Client avoids burdening the backend servers even more in these kinds of scenarios by increasing the time between subsequent requests.
Caching & intelligent protocols are in place to enhance the user experience.
4.3 Dealing with Nefarious Traffic
Popularity attracts unwanted attention. The application with a combination of whitelisting and industry best practices drives away nefarious traffic as soon as it is discovered avoiding the burden on the servers.
4.4 Do Not Do Real-time If It’s Not Business Critical
Recently Hotstar introduced a play-along feature, where the users can interact with their folks, play trivia & stuff on the same platform watching cricket. This feature being real-time spiked the concurrent traffic load on their backend servers.
Doing stuff in real-time has its costs associated. Real-time features spike the concurrent traffic by quite an extent. And to manage concurrent traffic we need powerful hardware which can be avoided if the feature isn’t real-time.
All these experiences are pretty insightful & help us plan, design & execute in a better & informed way.
Recommended Read: Master System Design For Your Interviews Or Your Web Startup
Handpicked Resources to Learn Software Architecture and Large Scale Distributed Systems Design
I’ve put together a list of resources (online courses + books) that I believe are super helpful in building a solid foundation in software architecture and designing large-scale distributed systems like Facebook, YouTube, Gmail, Uber, and so on. Check it out.
Subscribe to the newsletter to stay notified of the new posts.
If you liked the article, share it on the web. You can follow scaleyourapp.com on social media, links below, to stay notified of the new content published. I am Shivang, you can read about me here!
More On the Blog
> Spotify Engineering: From Live to Recording
> Ingesting LIVE video streams at a global scale at Twitch
> $64,944 spent on AWS, to support 25,000 customers, in August by ConvertKit.
> Read how Storytel engineering computes customer consumption of books transitioning from batch processing to streaming bookmarks data with Apache Beam and Google Cloud.
> How Pokemon Go scales to millions of requests per second?
> Insight into how Grab built a high-performance ad server.
SUBSCRIBE TO MY NEWSLETTER to be notified of new additions to the list. Fortnight/monthly emails.
Looking for developer, software architect jobs? Try Jooble. Jooble is a job search engine created for a single purpose: To help you find the job of your dreams!!
- Web Application Architecture Explained With Designing a Real-World Service
- Wide-column, Column-oriented and Column Family Databases – A Deep Dive with Bigtable and Cassandra
- Design For Scale and High Availability – What Does 100 Million Users On A Google Service Mean?
- How Razorpay handled significant transaction bursts during events like IPL
- Facebook’s Photo Storage Architecture