View Categories

SQL vs. NoSQL Decision Matrix

4 min read

The “Flexible Schema” Trap #

The biggest lie in database selection is: “We are choosing MongoDB because we don’t know our data structure yet.”

That is not “agility.” That is laziness.

If you don’t know your data structure, you don’t understand your business domain. Choosing NoSQL just to dump JSON blobs is a technical loan with a high interest rate. Eventually, you will need to report on that data, and you will find yourself writing complex application-level joins because your database can’t do them for you.

The Logic: It’s About Access Patterns, Not Just Scale #

The decision shouldn’t be based on “how much data” you have (Postgres can handle terabytes easily). It should be based on Access Patterns.

SQL (Relational): Optimized for Questions.

  • Strengths: Complex joins, ad-hoc queries, data integrity (ACID), and rigidly defined relationships.
  • The Logic: If your data is highly interconnected (e.g., A “User” places an “Order” containing “Items” that have “Prices”), SQL is the mathematical standard.

NoSQL (Non-Relational): Optimized for Ingestion & Retrieval.

  • Strengths: High write throughput, horizontal scaling (sharding), and hierarchical data structures.
  • The Logic: If your data is an aggregate (e.g., a “Product Catalog” page where you want to grab the title, price, and specs in one single key-lookup), NoSQL wins.

The Decision Matrix: The “Logic” Checklist #

Do not use NoSQL unless you have a specific architectural reason. Default to SQL (Postgres) for 90% of use cases.

Feature RequirementThe Logic ChoiceWhy?
Financial TransactionsSQLACID Compliance. You cannot afford “Eventual Consistency” when moving money. If the debit succeeds but the credit fails, you are fired.
Social GraphGraph DB (Neo4j) / SQLRecursive joins (Friends of Friends) kill performance in document stores.
IoT Sensor StreamsNoSQL (Time Series)Write speed is king. You need to ingest 100k events/sec. Update consistency doesn’t matter (sensors don’t change past values).
User Session DataNoSQL (Key-Value)Simple lookup. get(session_id). You never need to join a session with a billing table in real-time.
Dynamic AttributesNoSQL (Document)E-commerce catalogs where a “Laptop” has CPU/RAM but a “Shirt” has Size/Color.

Real-World Architecture: Polyglot Persistence #

Sophisticated architectures rarely choose just one. They use “Polyglot Persistence”—using the right tool for the specific job within the same platform.

Example: An E-Commerce Platform

  1. Product Catalog: Stored in MongoDB (NoSQL). Why? Fast reads. The frontend requests product_id: 123 and gets the whole JSON object (images, specs, variants) in 5ms. No joins required.
  2. Orders & Billing: Stored in Postgres (SQL). Why? Transactional integrity. We need to ensure inventory is deducted exactly when payment is confirmed.
  3. Search: Stored in Elasticsearch. Why? Fuzzy text matching that neither SQL nor Mongo does well.

Architecture Diagram: The Hybrid Approach #

Here is how these databases logically sit behind your API Gateway.

graph TD
    User["User / Client"] --> API["API Gateway"]

    subgraph ServiceLayer ["The Service Layer"]
        OrderService["Order Service"]
        CatalogService["Catalog Service"]
        AnalyticsService["Analytics Service"]
    end

    API --> OrderService
    API --> CatalogService
    API --> AnalyticsService

    subgraph DataLayer ["The Data Layer (Polyglot)"]
        SQL[("PostgreSQL")]
        NoSQL[("MongoDB")]
        TSDB[("InfluxDB")]
    end

    OrderService -- "ACID Transactions" --> SQL
    CatalogService -- "Fast Retrieval" --> NoSQL
    AnalyticsService -- "High Write Volume" --> TSDB

    %% Note Node defined here
    NoteNode["Note: This is 'Polyglot Persistence'. <br/>Using the right tool for the specific access pattern."]

    %% Connect the note to the bottom layer so it stays centered below
    NoSQL -.-> NoteNode

Case Study: Discord’s Migration #

Discord is a prime example of “Logic over Hype.”

  • Original State: They used MongoDB for storing billions of chat messages.
  • The Problem: MongoDB stores data contiguously on disk. As users chatted randomly across channels, the data became fragmented. Reading widespread messages caused massive disk I/O latency.
  • The Logic Shift: They realized their access pattern was “Write Heavy, Read Recent.”
  • The Solution: They migrated to Cassandra (and later ScyllaDB).
    • Cassandra uses an LSM Tree structure (Log-Structured Merge-tree) which is optimized for appending writes sequentially.
    • Result: They now handle trillions of messages with consistent low latency. They didn’t choose NoSQL because it was “cool”; they chose a specific type of NoSQL (Column-family) because the underlying data structure matched their read/write physics.

Conclusion #

Your database is your gravity. It is the hardest thing to change later.

  • Start with Postgres. It supports JSONB columns now, so you can actually do “NoSQL” inside SQL if you really need a few dynamic fields.
  • Only split to a dedicated NoSQL engine when your Write Throughput exceeds what a vertical SQL server can handle, or your data shape is truly non-relational (like a graph).