DB Dilemmas : What Tradeoffs to Consider? Part 1

ยท

8 min read

DB Dilemmas  : What Tradeoffs to Consider?  Part 1
Play this article

Choosing the right database for a project feels a lot like standing at a crossroads. One path might lead to a smooth journey, while another could be fraught with obstacles, detours, and unforeseen challenges. Here's my take on it, from my own experiences and countless hours spent troubleshooting and strategizing.

Picture this: you're setting up a restaurant. The database is like your kitchen. If it's too small, you'll struggle when the crowd pours in. Too big? You're wasting money on space you don't need. Use outdated equipment? Dishes take longer, and quality suffers. Go ultra-modern without proper training? Mistakes happen, and things can go up in flames. Your choice impacts efficiency, cost, customer satisfaction, and ultimately, your restaurant's reputation.

The right database can set you up for success. It's scalable, cost-efficient, and reliable. But choose incorrectly, and you're looking at potential data losses, slow services, security breaches, and a whole host of other problems.


Where could a wrong decision lead you towards?

Imagine you're launching an e-commerce platform.

Given your familiarity with MySQL, you decide to use it as the backbone for your platform. It seems like a solid choice: it's relational, reliable, and offers ACID compliance. But as your platform starts to grow, challenges begin to arise.

What Potential Challenges could be there?

  1. Inventory Management: With a diverse range of products, each having multiple attributes and complex relationships, querying became increasingly slower in MySQL.

  2. Scalability Concerns: The relational nature of MySQL is primarily designed for vertical scaling. surge in user traffic, as the number of simultaneous transactions increased significantly. This led to performance bottlenecks, especially during peak sales periods.

  3. Search Functionality: Search queries in MySQL began to lag. Implementing features like auto-suggest or filter-based searches became more challenging.

  4. Real-time Recommendations: MySQL isn't inherently optimized for such operations, leading to delays in generating and displaying relevant product suggestions.

  5. Cost Overheads: To mitigate the performance challenges, there was a need to optimize queries, add more indexes, and invest in robust server hardware. These solutions, though effective to some extent, increased operational costs.

  6. Migration Considerations: Recognizing that a different database system might be more suited for the platform's needs, the idea of migrating was considered. However, transitioning to a new database is a complex process, fraught with potential data integrity issues and downtime risks.

  7. Database Locks: MySQL, when dealing with high transaction volumes, can encounter locking issues. Multiple transactions trying to access the same data led to lock waits, further escalating the latency problems and occasionally causing transaction timeouts.

  8. Schema Migration Difficulties: E-commerce platforms evolve rapidly, necessitating frequent changes to the database schema. With MySQL, schema migrations, especially on large tables, can be slow and disruptive, affecting availability.

  9. Rigidity in Altering Schemas: Making alterations to existing schemas in MySQL can be a daunting task. For instance, adding a single column to a highly normalized table often required extensive changes and downtime, impacting both the development velocity and the platform's availability.

  10. Normalization Overheads: The platform, having opted for a high degree of normalization, faced challenges when even minor schema changes were required. A simple addition of a column, given the normalization, necessitated multiple code changes, increasing the development and testing effort.

  11. Scaling Globally: With the aim to cater to a global audience, the need for data localization and ensuring compliance with regional data laws meant setting up separate instances for different regions. This not only increased operational overhead but also introduced complexities in maintaining data consistency across instances.

What Could've been a Better Solution?
Continue Reading to find out ๐Ÿ˜‰


That's why I've spent time creating a guide that demystifies the process of choosing a database. It's not about telling you which is the best, but about giving you the knowledge to decide which is best for you. It's a culmination of lessons learned, sometimes the hard way, and insights gleaned from various projects.

In this journey, there's no room for guesswork.
The stakes are high, and the impact of your choice will be felt for years.
A database isn't just a place to store data; it's the heart of your application, and it dictates the pulse of your operations.

Let's deep dive into this


High Priority

1. General Information:

AttributePossible Values/Buckets
Type of DatabaseRelational, NoSQL (Document, Key-Value, Column-store, Graph, Object Oriented, XML, Time Series, Multi-Model, Vector), NewSQL, Time-Series DB
LicenseOpen Source, Commercial, Hybrid
Community and SupportHigh, Moderate, Low
CostFree, Low, Medium, High
Ease of UseEasy, Moderate, Complex
Cloud Managed SolutionsAWS, Azure, GCP, Self-Managed
Database Infrastructure Management OverheadSelf-managed, Managed (e.g., AWS RDS), Serverless (e.g., DynamoDB)
Compatibility Across CloudsHigh, Moderate, Low (e.g., DynamoDB 0%, SQL 100%)

2. Data Model:

AttributePossible Values/Buckets
Data Structure FlexibilityJSON, Nested Document with Constraints, BLOB, Text, XML, Binary, Custom
SchemaFixed, Dynamic, Schema-less
Complex Data RelationshipsHierarchical, Relational, None
ORM/ODM/OGMSupported, Not Supported
Capability of Querying JSON & Free Flow AttributesSupported, Not Supported for SQL; Default for NoSQL

3. Transactions:

AttributePossible Values/Buckets
ACID SupportFull, Partial, None
Consistency LevelStrong, Eventual, Causal, None
Multi-Document TransactionsSupported, Not Supported
Isolation LevelsRead uncommitted, Read committed, Repeatable read, Serializable
Atomicity and RollbackSupported, Not Supported

4. Performance:

AttributePossible Values/Buckets
Read/Write ThroughputHigh, Medium, Low
Indexing FlexibilitySecondary, Partial, Composite, Cost of Indexing (Low, Medium, High)
CachingBuilt-in, Requires external, Not available
Partitioning and ShardingSupported, Not Supported
ReplicationSingle-node, Master-slave, Multi-master
Load BalancingSupported, Requires external, Not available

5. Reliability and Fault Tolerance:

AttributePossible Values/Buckets
DurabilityHigh, Medium, Low
Backup and RecoveryAutomatic, Manual, Not Supported
High AvailabilitySupported, Not Supported
Disaster RecoverySupported, Not Supported

6. Scalability:

AttributePossible Values/Buckets
Horizontal ScalingEasy, Moderate, Complex
Vertical ScalingEasy, Moderate, Complex
Distributed Systems SupportNative, Through extensions, Not Supported
CAP Theorem AdherenceCP, AP, CA, None

7. Security:

AttributePossible Values/Buckets
Authentication and AuthorizationBuilt-in, Requires external, Not available
EncryptionAt rest, In transit, Both, None
Audit LoggingSupported, Not Supported

Medium Priority

8. Operational:

AttributePossible Values/Buckets
MaintenanceEasy, Moderate, Complex
Monitoring and AlertsBuilt-in, Requires external, Not available
Upgrade PathSeamless, Requires downtime, Complex
Garbage CollectionAutomatic, Manual, Not Supported

9. Special Features:

AttributePossible Values/Buckets
Full-Text SearchNative, Through extensions, Not Supported
Geospatial SupportNative, Through extensions, Not Supported
Time-Series SupportNative, Through extensions, Not Supported
Use Case SpecializationGeneral, Log search, Time-series, Graph processing, etc.
API and IntegrationRESTful, GraphQL, RPC, Proprietary
ETL ProcessSupported, Not Supported

10. Pricing:

AttributePossible Values/Buckets
Pricing ModelPay-per-use, Subscription, Tiered
Cost per 1000 Calls (Read/Write)$X.xx (specific to the database and tier)
Infrastructure Hosting CostSelf-hosted, Cloud (AWS, GCP, Azure, etc.), Hybrid
Additional Features CostFree, Premium, Enterprise

11. Compliance:

AttributePossible Values/Buckets
Regulatory ComplianceGDPR, HIPAA, CCPA, etc.
CertificationsISO 27001, SOC 2, etc.

12. Vendor Specific:

AttributePossible Values/Buckets
Vendor ReputationHigh, Moderate, Low
Vendor Support24/7, Business hours, None
Product RoadmapClear & Detailed, Vague, Not Available

Low Priority

13. Others:

AttributePossible Values/Buckets
Referential Integrity & Cascade DeletesSupported, Not Supported
Race Condition HandlingSupported, Not Supported
LockingOptimistic, Pessimistic, None
CompatibilityHigh, Moderate, Low
Vendor Lock-inHigh, Moderate, Low

14. Additional Considerations:

AttributePossible Values/Buckets
Community and EcosystemActive, Moderate, Inactive
Training and ResourcesAbundant, Limited, None
Migration SupportComprehensive, Basic, None
Integration with Other ToolsHigh, Moderate, Low
Native cloud Solutions OfferedAWS, Azure, GCP, ....

15. Fine-Tuning and Configuration:

AttributePossible Values/Buckets
Index TuningGranular, Basic, None
Query OptimizationAutomatic, Manual, None
Storage ConfigurationGranular, Pre-set, Not Configurable
Replication ConfigurationGranular, Pre-set, Not Configurable

Now this brings us to the question, how do we Articulate our requirements for our Use Case, hop on to Part 2 : https://aayushgupta.tech/db-dilemmas-how-to-define-requirements-part-2

ย