Databases
Explore database fundamentals, data storage patterns, and trade-offs between common backend database choices.
Why Databases Exist
Applications need a reliable way to store and retrieve data that persists beyond a single program execution.
App
Database
Data created in memory (temporary)
- Databases provide durable storage so data survives restarts and failures.
- They enable efficient querying to retrieve specific data quickly.
- They ensure consistency and support concurrent access from multiple users or services.
Details
In real-world systems, applications constantly generate and rely on data such as user accounts, transactions, messages, and logs. Storing this data directly in application memory is not sufficient, because memory is temporary and disappears when the program stops or crashes.
Databases solve this by providing durable storage. Data is written to disk in a way that survives restarts and system failures. This is what allows systems like banking platforms, social networks, and e-commerce sites to maintain long-term state.
Beyond storage, databases provide structured ways to access data. Instead of manually scanning through files, developers can use queries to retrieve exactly what they need. For example, fetching a user by email or retrieving all orders from the past week.
Databases also enforce consistency rules, ensuring that data remains valid even when multiple operations occur simultaneously. Without this, systems would easily corrupt data under concurrent access.
At a high level, the database acts as the system of record. The application processes requests, but the database is responsible for storing and managing the data those requests depend on.
SQL Databases
SQL databases organize data into structured tables with defined schemas and relationships, enabling reliable and consistent data management.
Structured tables organize data into rows and columns
- Data is stored in tables with rows and columns, forming a structured format.
- Schemas enforce consistency by defining exactly how data must be stored.
- Relational queries allow combining data across multiple tables.
Details
SQL databases follow the relational model, where data is organized into tables. Each table represents an entity (such as users or orders), and each row represents a record within that entity.
A key feature is the schema. Before storing data, you define the structure, including column types and constraints. This ensures that all data follows a consistent format, which reduces errors and maintains integrity.
Relationships between tables are central to SQL systems. For example, a user table can be linked to an orders table through a user_id, allowing the system to connect related data efficiently.
SQL databases also support powerful query capabilities. Using joins, filters, and aggregations, developers can retrieve complex datasets across multiple tables in a single query.
This structured approach provides strong consistency and reliability, which is why SQL databases like PostgreSQL and MySQL are widely used in systems where correctness and data integrity are critical.
NoSQL Databases
NoSQL databases prioritize flexibility and scalability by allowing different data models instead of a fixed relational structure.
Flexible schema — records don’t need identical structure
Different models — not limited to tables
Direct lookup — simple key → value access
Horizontal scaling — data spread across multiple machines
- NoSQL databases use flexible schemas, allowing data to evolve without strict structure.
- They support multiple data models such as document, key-value, wide-column, and graph.
- They are designed for high scalability and handling large, distributed datasets.
Details
Unlike SQL databases, NoSQL systems do not require a predefined schema. This means each record can have a different structure, making them useful when data is unstructured or frequently changing.
Different NoSQL databases optimize for different use cases. Document databases like MongoDB store JSON-like objects, key-value stores like Redis provide extremely fast lookups, wide-column databases like Cassandra handle massive distributed datasets, and graph databases model relationships directly.
This flexibility makes NoSQL systems easier to scale horizontally across many servers, which is critical for large-scale applications handling high traffic.
The tradeoff is that many NoSQL systems sacrifice strict consistency or complex relational queries in favor of performance and scalability. Choosing between SQL and NoSQL depends on the structure of the data and the requirements of the system.
Indexes
Indexes speed up data retrieval by allowing the database to locate rows without scanning the entire table.
- Without an index, the database scans every row (full table scan).
- With an index, the database performs a fast lookup to find matching data.
- Indexes improve read performance but add overhead to writes and storage.
Details
When a query runs without an index, the database must check each row one by one to find matching results. This is called a full table scan, and it becomes extremely slow as the dataset grows.
An index acts like a lookup structure (similar to a book index). Instead of scanning everything, the database can jump directly to the location of the data. This significantly reduces query time, especially for large tables.
For example:
CREATE INDEX idx_users_email ON users(email);
This creates an index on the email column, allowing queries like "find user by email" to execute much faster.
However, indexes are not free. Every time data is inserted, updated, or deleted, the index must also be updated. This makes writes slightly slower and increases storage usage.
Effective database design involves choosing the right columns to index—typically those frequently used in search conditions, filters, or joins.
Transactions
Transactions group multiple operations into a single unit so that either all changes succeed or none are applied.
- Transactions ensure multiple related operations are treated as one atomic unit.
- If all steps succeed, changes are committed; if any fail, everything is rolled back.
- They prevent inconsistent data during complex updates.
Details
In many systems, operations are not isolated. For example, transferring money between two accounts requires updating both balances. If one update succeeds and the other fails, the system becomes inconsistent.
Transactions solve this by grouping operations together:
BEGIN UPDATE account A UPDATE account B COMMIT
If any step fails during execution, the database performs a rollback, undoing all previous changes in the transaction.
This guarantees that partial updates never occur. Either the entire operation completes successfully, or the system returns to its previous state.
Transactions are critical in systems where correctness matters, such as financial systems, inventory tracking, and booking systems. Without them, failures or concurrency issues would easily corrupt data.
ACID Properties
ACID properties define the guarantees that make database transactions reliable and safe.
- Atomicity ensures all operations in a transaction either fully complete or fully fail.
- Consistency guarantees the database always remains in a valid state.
- Isolation and durability ensure transactions do not interfere and committed data is permanent.
Details
ACID is a set of four properties that relational databases enforce to ensure correctness during transactions.
Atomicity means a transaction is indivisible. If any step fails, the entire transaction is rolled back, preventing partial updates.
Consistency ensures that after a transaction completes, the database adheres to all rules such as constraints, relationships, and data validity. Invalid states are never allowed.
Isolation ensures that multiple transactions running at the same time do not interfere with each other. Each transaction behaves as if it is running alone, even in a concurrent system.
Durability guarantees that once a transaction is committed, the data is permanently stored, even if the system crashes immediately after.
Together, these properties allow databases to handle failures, concurrency, and complex operations without corrupting data. They are foundational to building reliable systems where correctness matters.
Query Optimization
Databases use query optimization to determine the most efficient way to execute a query.
- A query planner analyzes the query and chooses an efficient execution strategy.
- It decides how to access data, such as using indexes instead of full table scans.
- Optimization reduces latency and resource usage for large datasets.
Details
When you send a query to a database, it does not blindly execute it step by step. Instead, the database first analyzes the query using a component called the query planner.
The planner evaluates multiple possible ways to execute the query and selects the most efficient one. This includes decisions like whether to use an index, how to join tables, and the order of operations.
For example, if a query searches for a user by email and an index exists on that column, the database will use the index rather than scanning every row. This can reduce execution time from seconds to milliseconds on large datasets.
The result of this process is an execution plan, which defines exactly how the query will run internally.
Query optimization becomes critical as data grows. Poorly optimized queries can cause slow responses, high CPU usage, and system bottlenecks, while well-optimized queries keep systems fast and scalable.
Replication
Replication improves reliability and scalability by copying data across multiple database instances.
- Data is copied from a primary database to one or more replicas.
- Writes typically go to the primary, while reads can be distributed across replicas.
- Replication improves availability, fault tolerance, and read performance.
Details
Replication is used to keep multiple copies of the same data on different database servers. The most common setup is a primary-replica model.
In this model, the primary database handles all write operations. Any changes made are then propagated to replica databases. These replicas maintain copies of the data and can serve read requests.
This setup provides several benefits. If the primary database fails, a replica can take over, improving system availability. It also allows systems to scale read traffic by distributing queries across multiple replicas instead of overloading a single database.
However, replication introduces complexity. There can be slight delays between when data is written to the primary and when it appears on replicas (replication lag). This means replicas may not always have the most up-to-date data.
Despite this tradeoff, replication is a core technique in building scalable and fault-tolerant backend systems.
Sharding
Sharding scales databases by splitting data across multiple servers instead of relying on a single system.
- Data is partitioned into shards, each storing a subset of the total dataset.
- Sharding enables horizontal scaling by adding more database servers.
- It distributes load across machines to handle large datasets and high traffic.
Details
As data grows, a single database server eventually becomes a bottleneck. Sharding solves this by splitting data into smaller pieces, with each shard stored on a different server.
For example:
Shard 1 → users 1–1M
Shard 2 → users 1M–2M
Shard 3 → users 2M–3M
Requests are routed to the correct shard based on the data being accessed. This allows systems to scale horizontally and handle much larger workloads.
The tradeoff is added complexity, especially when queries need data from multiple shards.
Question Section
1 / 5
This track is locked
Buy this track once to unlock all of its lessons.