Expanding Globally to 50 Countries, Moving from Multi Tenant DB Architecture to Single DB: Disney+…
In later half of the 2021, Disney+ Hotstar decided to launch in Middle Eastern and Northern African Countries, around 19 countries.
The Content Platform team manages Database for Content Management System(CMS).
CMS is built on RDBMS Architecture and constitutes of more than 200 tables storing humongous amount of MetaData, hierarchy, mapper tables for all the playable and non-playable assets like : movie, shows, seasons, sport matches, images, playback urls, menu, page, trays, etc.
Each Tenant( Country ), has it’s own database hosted at the nearest available region. So for every country we decide to expand into, these things happen:
- We need to maintain another Database, monitor all the metrics like CPU utilization, read write latencies, network metrics, IOPS, etc..
- Decide and monitor individual scaling.
- As most of the content is same and only few fields are localised, huge amount of Data Replication is there, the Ops team needs to replicate and maintain sanity at multiple tenant level.
- For RDBMS systems, to reduce data redundancy, inconsistency and achieve Integrity high degree of Normalization is maintained, which makes Data retrieval slower and multiple join queries and complex DAOs and DTOs inevitable.
- Due to high degree of Normalization, code repository becomes unnecessarily large and accommodation of new changes and De-bugging becomes time taking and tough.
For going live in 50 countries, we need to setup and maintain 50 Databases, 200 replica clusters and 10k tables with RDBMS architecture
How did we solve this?
Database choice and design
Schema full vs Schema less
RDBMS vs No Sql Comparison
We Decided to go with Schema less on this
Types of NoSql Databases
We Benchmarked using Key Value Storage( ex: DynamoDB), GraphDB( ex: Neo4J) ,DocumentDB( ex: MongoDB) vs Column Oriented on following aspects :
- support for ACID Transactions
- Aggregation and group by queries
- Join Queries Performance
- Queries complexity
- Schema Structure
- Horizontal scalability
- Secondary indices support
- Audit and integration support for external services
- Supports Change Data Capture (CDC)
We Decided to go with DocumentDB on this, for following reasons :
Data Model agility : A document based approach also allows us to embed small arrays without affecting the performance. This allows us to move away from many number of mapper tables and also eliminate the need for a cache layer for base entities together this leads to simpler data layout and app implementation
Horizontal Scalability : MongoDB supports horizontal scalability with auto sharding and multi region deployment as well.
Evolving Business requirements (Growth in terms of complexity of requirements) : Flexibility in data model allows the schema to evolve through code. We can have sparse indexes, attributes being added to newer documents, configurable and tunable schema validators
We Chose AWS DocumentDB over MongoDB Atlas, because we wanted to be under same Ecosystem of AWS and not setting and maintaining separate infra. and avoiding setting up dedicated instance for VPC peering.
Now comes the part where we need to figure out Collection Patterns and do Database Modelling.
What do we want to achieve while defining these collection patterns :
- Achieve high Degree of De-Normalization
- Remove mapper tables and maintain Referential Integrity
- Customizable DB Reference Objects for keeping data of all other table’s data, whose foreign id was in sql table
Example : How Data looks like in SQL :
De-Normalized Data Model in Document DB would be something like:
Document in DocumentDB
Note : Shown schema is dummy, and consist of very few tables and columns of the actual schema
Considering 50 Countries, there would have been 10,000 tables storing all the data, moving to De-Normalized Data Model, the number of Collections is limited to 47, and doesn’t grow as Disney+ Hotstar expands to more number of Countries
Connect with me on Linkedin:)