Caching

Why Caching Exists

Caching reduces system latency and database load by storing frequently accessed data in faster storage.

🌐

App

Cache (fast)

🗄️

DB load

Requests check cache first. Hits return instantly ⚡. Misses go to DB, increasing load.

Databases are relatively slow and expensive to query repeatedly.
Caching stores frequently accessed data in faster storage like memory.
If data is found in the cache, the database can be skipped entirely.

Details

In backend systems, many requests ask for the same data repeatedly—such as user profiles, product listings, or configuration settings. Querying the database every time introduces unnecessary latency and increases load on the system.

Caching solves this by storing frequently accessed data in a faster layer, typically in memory. When a request comes in, the system first checks the cache. If the data is already there, it can return the result immediately without touching the database.

This significantly improves response time and reduces pressure on the database, which is usually one of the most expensive and limited resources in a system.

At a high level, caching acts as a shortcut. Instead of recomputing or re-fetching the same data repeatedly, the system reuses previously retrieved results.

Cache Hit vs Cache Miss

Every cache lookup results in either a hit (fast) or a miss (requires database access).

Cache Hit (fast)

Cache Miss (slow)

Hit Rate

0% fast responses

🐢 Frequent slow queries

Higher hit rate means fewer DB queries and faster responses.

A cache hit returns data immediately from the cache.
A cache miss requires querying the database and then storing the result.
System performance depends heavily on achieving a high cache hit rate.

Details

When a request arrives, the system first checks whether the requested data exists in the cache.

If the data is found, this is called a cache hit. The response is returned quickly because it avoids a database query, which is much slower.

If the data is not found, this is called a cache miss. The system must query the database, retrieve the data, store it in the cache, and then return the response.

This pattern creates two distinct paths:

Cache → Hit → fast response
Cache → Miss → database → store → response

The goal of caching is to maximize cache hits. The higher the hit rate, the faster the system and the lower the load on the database.

Time-To-Live (TTL)

TTL controls how long data stays in the cache before it is automatically removed to prevent staleness.

Each cache entry decays over time. When TTL hits 0, it disappears and must be refetched.

TTL defines the lifetime of a cached entry.
Once TTL expires, the data is removed from the cache.
Proper TTL balances freshness of data with performance benefits.

Details

Cached data cannot stay forever because the underlying data may change. If the cache is not updated, the system may return outdated or incorrect information.

Time-To-Live (TTL) solves this by attaching an expiration time to each cached entry. After the TTL duration passes, the cache automatically removes the data.

For example, a product catalog might be cached for 5 minutes, while a frequently changing API response might only be cached for 60 seconds.

Choosing the right TTL is a tradeoff. A longer TTL improves performance by increasing cache hits, but risks serving stale data. A shorter TTL keeps data fresh but increases database load.

TTL is one of the simplest and most widely used mechanisms for maintaining cache correctness.

Cache Invalidation

Cache invalidation ensures that cached data stays consistent with the source of truth when underlying data changes.

👤

Client

→ reads →

User: Alice

Cache

✓ Client reads from cache (fast)

When data changes in the database, related cache entries must be updated or removed.
Incorrect invalidation leads to stale or inconsistent data being served.
Common strategies include deleting or refreshing cache entries after updates.

Details

Caching introduces a fundamental problem: the cache can become outdated when the underlying data changes. If the system continues serving old cached data, users may see incorrect information.

Cache invalidation solves this by ensuring that when data is updated in the database, the corresponding cache entries are either removed or refreshed.

A common approach is to delete the cache entry immediately after a database update. The next request will result in a cache miss, fetch fresh data from the database, and store the updated value in the cache.

Another approach is to proactively update the cache at the same time as the database change, keeping both in sync.

The difficulty lies in identifying exactly which cache entries need to be invalidated. Mistakes in this process are a common source of bugs in distributed systems.

Cache-Aside Pattern

The cache-aside pattern lets the application control when data is loaded into the cache.

incoming request

⚡

Cache Hit

🗄️

Fetch DB

every request splits: hit returns instantly, miss loads then updates cache

The application first checks the cache before querying the database.
On a cache miss, data is fetched from the database and stored in the cache.
This approach gives full control over caching behavior.

Details

In the cache-aside pattern, the application is responsible for interacting with both the cache and the database.

When a request comes in, the application first checks if the data exists in the cache. If it does (cache hit), the data is returned immediately.

If the data is not in the cache (cache miss), the application queries the database, retrieves the result, stores it in the cache, and then returns the response.

This approach is widely used because it is simple and flexible. The application can decide what to cache, when to cache it, and how long it should live.

However, it also means the application must handle cache invalidation and consistency carefully, since the cache is not automatically kept in sync with the database.

Write-Through Caching

Write-through caching keeps the cache and database synchronized by updating both on every write.

💻

App

⚡

Cache

🗄️

Database

✏️

Write initiated

Write must propagate to both layers before completion.

Write completes only after both cache and DB update → slower writes

Every write updates both the cache and the database.
This ensures cached data is always up to date.
The tradeoff is increased write latency.

Details

In write-through caching, whenever data is written, the system updates the cache first and then writes the same data to the database. This guarantees that both layers always contain the same value.

Because the cache is updated immediately, any subsequent read will return the most recent data without needing to check the database. This simplifies read logic and avoids stale data issues.

The tradeoff is that write operations become slower, since each write must be performed twice—once to the cache and once to the database. This increases latency and can reduce throughput under heavy write workloads.

Additionally, write-through caching can consume more resources because both storage layers are actively maintained. Despite this, it is useful in systems where consistency is critical and predictable read behavior is required.

Cache Technologies

Different caching technologies operate at different layers to reduce latency and offload backend systems.

📩

🧑

User

🌍

CDN

⚡

Redis

📦

Memcached

🖥️

Server

Redis is a fast in-memory store with TTL support and advanced data structures.
Memcached is a lightweight cache optimized for high-throughput key-value access.
CDNs cache content closer to users to reduce global network latency.

Details

Caching is implemented using different technologies depending on where it sits in the system.

Redis is one of the most widely used caching systems. It stores data in memory, making it extremely fast, and supports features like TTL, lists, sets, and other data structures. It is commonly used for application-level caching, sessions, and real-time data.

Memcached is another in-memory cache, but simpler than Redis. It focuses purely on fast key-value storage and is optimized for high throughput, making it effective for basic caching use cases.

At a higher level, CDNs (Content Delivery Networks) cache static content such as images, videos, and HTML at servers distributed around the world. This reduces the distance data must travel, improving performance for global users.

In many systems, these technologies are combined. A request may first hit a CDN, then an application cache like Redis, and only reach the database if needed.

Where Caching Is Used

Caching is applied at multiple layers in a system to reduce latency and minimize load on deeper components.

💾

🌍 CDN

⚙️ App

🗄️ DB Cache

📦

Incoming request

Each layer acts like a shield — most requests never reach the database.

Caching exists at the edge (CDN), application layer, and database layer.
Each layer serves faster responses by preventing unnecessary deeper requests.
Stacking caches improves performance and scalability across the system.

Details

Caching is not limited to a single place—it is used across multiple layers of a backend system.

At the edge, CDNs cache static content close to users, reducing network latency. At the application layer, tools like Redis or Memcached cache frequently accessed data such as user sessions or API responses.

Some databases also include query caching, storing results of frequent queries to avoid recomputation.

These layers form a hierarchy:

Client → CDN → Application Cache → Database

Each layer absorbs part of the request load, ensuring that the database is only accessed when necessary. This layered approach is critical for scaling systems efficiently.

Cache Tradeoffs

Caching improves performance significantly, but it introduces complexity around data consistency and correctness.

⚡ Performance90%

✅ Correctness95%

⚡

✅

Cache is fresh — fast and correct

Caching reduces latency and database load, improving system scalability.
It introduces risks such as stale data and consistency issues.
Managing cache invalidation becomes a key challenge in system design.

Details

Caching is one of the most effective ways to improve system performance. By serving data from faster storage, systems can respond more quickly and handle significantly more traffic with less load on the database.

However, this performance gain comes with tradeoffs. The biggest issue is stale data—cached values may no longer reflect the latest state of the database.

Ensuring consistency between the cache and the database introduces additional complexity. Developers must carefully design invalidation strategies to keep data accurate.

In practice, caching is a balance. Systems trade some level of consistency and complexity in exchange for major improvements in speed and scalability.

Question Section

Try to answer in your own words first, then flip the card to check.

1 / 5

Why Caching Exists

Caching reduces system latency and database load by storing frequently accessed data in faster storage.

🌐

App

Cache (fast)

🗄️

DB load

Requests check cache first. Hits return instantly ⚡. Misses go to DB, increasing load.

Databases are relatively slow and expensive to query repeatedly.
Caching stores frequently accessed data in faster storage like memory.
If data is found in the cache, the database can be skipped entirely.

Details

This significantly improves response time and reduces pressure on the database, which is usually one of the most expensive and limited resources in a system.

At a high level, caching acts as a shortcut. Instead of recomputing or re-fetching the same data repeatedly, the system reuses previously retrieved results.

Cache Hit vs Cache Miss

Every cache lookup results in either a hit (fast) or a miss (requires database access).

Cache Hit (fast)

Cache Miss (slow)

Hit Rate

0% fast responses

🐢 Frequent slow queries

Higher hit rate means fewer DB queries and faster responses.

A cache hit returns data immediately from the cache.
A cache miss requires querying the database and then storing the result.
System performance depends heavily on achieving a high cache hit rate.

Details

When a request arrives, the system first checks whether the requested data exists in the cache.

If the data is found, this is called a cache hit. The response is returned quickly because it avoids a database query, which is much slower.

If the data is not found, this is called a cache miss. The system must query the database, retrieve the data, store it in the cache, and then return the response.

This pattern creates two distinct paths:

Cache → Hit → fast response
Cache → Miss → database → store → response

The goal of caching is to maximize cache hits. The higher the hit rate, the faster the system and the lower the load on the database.

Time-To-Live (TTL)

TTL controls how long data stays in the cache before it is automatically removed to prevent staleness.

Each cache entry decays over time. When TTL hits 0, it disappears and must be refetched.

TTL defines the lifetime of a cached entry.
Once TTL expires, the data is removed from the cache.
Proper TTL balances freshness of data with performance benefits.

Details

Cached data cannot stay forever because the underlying data may change. If the cache is not updated, the system may return outdated or incorrect information.

Time-To-Live (TTL) solves this by attaching an expiration time to each cached entry. After the TTL duration passes, the cache automatically removes the data.

For example, a product catalog might be cached for 5 minutes, while a frequently changing API response might only be cached for 60 seconds.

Choosing the right TTL is a tradeoff. A longer TTL improves performance by increasing cache hits, but risks serving stale data. A shorter TTL keeps data fresh but increases database load.

TTL is one of the simplest and most widely used mechanisms for maintaining cache correctness.

Cache Invalidation

Cache invalidation ensures that cached data stays consistent with the source of truth when underlying data changes.

👤

Client

→ reads →

User: Alice

Cache

✓ Client reads from cache (fast)

When data changes in the database, related cache entries must be updated or removed.
Incorrect invalidation leads to stale or inconsistent data being served.
Common strategies include deleting or refreshing cache entries after updates.

Details

Caching introduces a fundamental problem: the cache can become outdated when the underlying data changes. If the system continues serving old cached data, users may see incorrect information.

Cache invalidation solves this by ensuring that when data is updated in the database, the corresponding cache entries are either removed or refreshed.

Another approach is to proactively update the cache at the same time as the database change, keeping both in sync.

The difficulty lies in identifying exactly which cache entries need to be invalidated. Mistakes in this process are a common source of bugs in distributed systems.

Cache-Aside Pattern

The cache-aside pattern lets the application control when data is loaded into the cache.

incoming request

⚡

Cache Hit

🗄️

Fetch DB

every request splits: hit returns instantly, miss loads then updates cache

The application first checks the cache before querying the database.
On a cache miss, data is fetched from the database and stored in the cache.
This approach gives full control over caching behavior.

Details

In the cache-aside pattern, the application is responsible for interacting with both the cache and the database.

When a request comes in, the application first checks if the data exists in the cache. If it does (cache hit), the data is returned immediately.

If the data is not in the cache (cache miss), the application queries the database, retrieves the result, stores it in the cache, and then returns the response.

This approach is widely used because it is simple and flexible. The application can decide what to cache, when to cache it, and how long it should live.

However, it also means the application must handle cache invalidation and consistency carefully, since the cache is not automatically kept in sync with the database.

Write-Through Caching

Write-through caching keeps the cache and database synchronized by updating both on every write.

💻

App

⚡

Cache

🗄️

Database

✏️

Write initiated

Write must propagate to both layers before completion.

Write completes only after both cache and DB update → slower writes

Every write updates both the cache and the database.
This ensures cached data is always up to date.
The tradeoff is increased write latency.

Details

In write-through caching, whenever data is written, the system updates the cache first and then writes the same data to the database. This guarantees that both layers always contain the same value.

Because the cache is updated immediately, any subsequent read will return the most recent data without needing to check the database. This simplifies read logic and avoids stale data issues.

Cache Technologies

Different caching technologies operate at different layers to reduce latency and offload backend systems.

📩

🧑

User

🌍

CDN

⚡

Redis

📦

Memcached

🖥️

Server

Redis is a fast in-memory store with TTL support and advanced data structures.
Memcached is a lightweight cache optimized for high-throughput key-value access.
CDNs cache content closer to users to reduce global network latency.

Details

Caching is implemented using different technologies depending on where it sits in the system.

Memcached is another in-memory cache, but simpler than Redis. It focuses purely on fast key-value storage and is optimized for high throughput, making it effective for basic caching use cases.

In many systems, these technologies are combined. A request may first hit a CDN, then an application cache like Redis, and only reach the database if needed.

Where Caching Is Used

Caching is applied at multiple layers in a system to reduce latency and minimize load on deeper components.

💾

🌍 CDN

⚙️ App

🗄️ DB Cache

📦

Incoming request

Each layer acts like a shield — most requests never reach the database.

Caching exists at the edge (CDN), application layer, and database layer.
Each layer serves faster responses by preventing unnecessary deeper requests.
Stacking caches improves performance and scalability across the system.

Details

Caching is not limited to a single place—it is used across multiple layers of a backend system.

Some databases also include query caching, storing results of frequent queries to avoid recomputation.

These layers form a hierarchy:

Client → CDN → Application Cache → Database

Each layer absorbs part of the request load, ensuring that the database is only accessed when necessary. This layered approach is critical for scaling systems efficiently.

Cache Tradeoffs

Caching improves performance significantly, but it introduces complexity around data consistency and correctness.

⚡ Performance90%

✅ Correctness95%

⚡

✅

Cache is fresh — fast and correct

Caching reduces latency and database load, improving system scalability.
It introduces risks such as stale data and consistency issues.
Managing cache invalidation becomes a key challenge in system design.

Details

However, this performance gain comes with tradeoffs. The biggest issue is stale data—cached values may no longer reflect the latest state of the database.

Ensuring consistency between the cache and the database introduces additional complexity. Developers must carefully design invalidation strategies to keep data accurate.

In practice, caching is a balance. Systems trade some level of consistency and complexity in exchange for major improvements in speed and scalability.

Question Section

Try to answer in your own words first, then flip the card to check.

1 / 5

Caching

Why Caching Exists

Cache Hit vs Cache Miss

Time-To-Live (TTL)

Cache Invalidation

Cache-Aside Pattern

Write-Through Caching

Cache Technologies

Where Caching Is Used

Cache Tradeoffs

Question Section

Related lessons

Cookie Consent

Caching

Why Caching Exists

Cache Hit vs Cache Miss

Time-To-Live (TTL)

Cache Invalidation

Cache-Aside Pattern

Write-Through Caching

Cache Technologies

Where Caching Is Used

Cache Tradeoffs

Question Section

Related lessons