Scaling APIs to 10 million rpm with P99 latency of 100ms : Disney+ Hotstar
Peak concurrency Disney+ Hotstar has seen is 25.3 million
The API at Hotstar which is responsible to fetch all the Metadata of any Content has seen a Scale of greater than 10 million RPM, while maintaining P99 latency of less than 100ms.
This can be viewed over network tab, Response is like this :
Response contains all the Metadata for any Content
Let’s discuss how an Ideal designed Architecture would look like :
Ideally Designed Architecture
API Gateway : Forwards the requests to respective Service
User Facing Service : This is responsible to get data for a particular content ID, as the Database is highly De-Normalized, The queries can have extremely Complex joins which can require querying upto 17 tables and Aggregation over that.
Highly De-Normalized Schema, complex joins are required to get Entity Data
Now the query time and creating a normalized view according to agreed upon response Schema will take significant Time and be reflected as increased latency
Content Management Service : All Create, Update and Delete Operations are done by this Service, Devs and Ops only are having access to this.
How Scaling is done : There exist basically two models to ensure seamless scalability: Traffic based and Ladder based.
Traffic-based : New nodes are added/deleted according to change in the metrics like :
- CPU Utilization
- Primary Memory Usage
- Number of DB Connections
- Network In/Out
Traffic Based Scaling
Bottlenceks, Challenges with this :
- Query Execution time is high.
- Checking the existence of key in Cache adds latency.
- Aggregations and computing the view in response Schema takes time
- If concurrency jumps from 10 million to 11 million under a minute, such sharp rise can’t be handled, as it takes some time to add new infrastructure to the pool, and the container and the application takes few seconds to start.
How Hotstar solved it?
How we solved all the Challenges :
- Not using Primary DB for Reads.
- Anytime a create, update or delete operation is there on primary DB, a view is created and indexed in Downstream Database, fine tuned for providing latencies in Microseconds(μs).
- Using Cache as the only Data Source, user facing service interacts with.
- Maintaining a Response Cache at Reverse Proxy Level
- Switching to ladder based Scaling
Optimized for Scale
DynamoDB Accelerator (DAX) is used an In-Memory Cache over DynamoDB, to provide read latencies in Microseconds(μs).
An Inline Write-Through Caching Strategy is used, helping in solving inconsistency problem and high throughput for writes as well
How is Autoscaling done ?
Ladder Based Scaling
Disney+Hotstar has pre-defined ladders per million concurrent users.
As more requests are processed by the system adds on, new infrastructure in terms of ladders is added.
As of now, the Disney+Hotstar has a concurrency buffer of 2 million concurrent users, which is optimally utilized during the peak events such as World Cup matches or IPL tournaments.
In case the number of users goes beyond this concurrency level, then it takes 90 seconds to add new infrastructure to the pool, and the container and the application take 74 seconds to start.
In order to handle this time lag, there is a pre-provisioned buffer, which is the opposite of auto-scaling and has proven to be a better option.
An example of Infra Ladder
Load Testing is done and Match day simulation is run to figure out how many K8 nodes and underlying DB Nodes are required for a specific service for a specific region is required to serve x million concurrent users. To handle the time lag of adding new infra and start application, there is a pre-provisioned buffer
Connect with me on Linkedin :)