Caching
Learn caching fundamentals to reduce latency, lower load, and improve backend performance at scale.
Why Caching Exists
Caching reduces system latency and database load by storing frequently accessed data in faster storage.
- Databases are relatively slow and expensive to query repeatedly.
- Caching stores frequently accessed data in faster storage like memory.
- If data is found in the cache, the database can be skipped entirely.
Details
In backend systems, many requests ask for the same data repeatedly—such as user profiles, product listings, or configuration settings. Querying the database every time introduces unnecessary latency and increases load on the system.
Caching solves this by storing frequently accessed data in a faster layer, typically in memory. When a request comes in, the system first checks the cache. If the data is already there, it can return the result immediately without touching the database.
This significantly improves response time and reduces pressure on the database, which is usually one of the most expensive and limited resources in a system.
At a high level, caching acts as a shortcut. Instead of recomputing or re-fetching the same data repeatedly, the system reuses previously retrieved results.
Cache Hit vs Cache Miss
Every cache lookup results in either a hit (fast) or a miss (requires database access).
- A cache hit returns data immediately from the cache.
- A cache miss requires querying the database and then storing the result.
- System performance depends heavily on achieving a high cache hit rate.
Details
When a request arrives, the system first checks whether the requested data exists in the cache.
If the data is found, this is called a cache hit. The response is returned quickly because it avoids a database query, which is much slower.
If the data is not found, this is called a cache miss. The system must query the database, retrieve the data, store it in the cache, and then return the response.
This pattern creates two distinct paths:
Cache → Hit → fast response
Cache → Miss → database → store → response
The goal of caching is to maximize cache hits. The higher the hit rate, the faster the system and the lower the load on the database.
Time-To-Live (TTL)
TTL controls how long data stays in the cache before it is automatically removed to prevent staleness.
- TTL defines the lifetime of a cached entry.
- Once TTL expires, the data is removed from the cache.
- Proper TTL balances freshness of data with performance benefits.
Details
Cached data cannot stay forever because the underlying data may change. If the cache is not updated, the system may return outdated or incorrect information.
Time-To-Live (TTL) solves this by attaching an expiration time to each cached entry. After the TTL duration passes, the cache automatically removes the data.
For example, a product catalog might be cached for 5 minutes, while a frequently changing API response might only be cached for 60 seconds.
Choosing the right TTL is a tradeoff. A longer TTL improves performance by increasing cache hits, but risks serving stale data. A shorter TTL keeps data fresh but increases database load.
TTL is one of the simplest and most widely used mechanisms for maintaining cache correctness.
Cache Invalidation
Cache invalidation ensures that cached data stays consistent with the source of truth when underlying data changes.
- When data changes in the database, related cache entries must be updated or removed.
- Incorrect invalidation leads to stale or inconsistent data being served.
- Common strategies include deleting or refreshing cache entries after updates.
Details
Caching introduces a fundamental problem: the cache can become outdated when the underlying data changes. If the system continues serving old cached data, users may see incorrect information.
Cache invalidation solves this by ensuring that when data is updated in the database, the corresponding cache entries are either removed or refreshed.
A common approach is to delete the cache entry immediately after a database update. The next request will result in a cache miss, fetch fresh data from the database, and store the updated value in the cache.
Another approach is to proactively update the cache at the same time as the database change, keeping both in sync.
The difficulty lies in identifying exactly which cache entries need to be invalidated. Mistakes in this process are a common source of bugs in distributed systems.
Cache-Aside Pattern
The cache-aside pattern lets the application control when data is loaded into the cache.
- The application first checks the cache before querying the database.
- On a cache miss, data is fetched from the database and stored in the cache.
- This approach gives full control over caching behavior.
Details
In the cache-aside pattern, the application is responsible for interacting with both the cache and the database.
When a request comes in, the application first checks if the data exists in the cache. If it does (cache hit), the data is returned immediately.
If the data is not in the cache (cache miss), the application queries the database, retrieves the result, stores it in the cache, and then returns the response.
This approach is widely used because it is simple and flexible. The application can decide what to cache, when to cache it, and how long it should live.
However, it also means the application must handle cache invalidation and consistency carefully, since the cache is not automatically kept in sync with the database.
Write-Through Caching
Write-through caching keeps the cache and database synchronized by updating both on every write.
- Every write updates both the cache and the database.
- This ensures cached data is always up to date.
- The tradeoff is increased write latency.
Details
In write-through caching, whenever data is written, the system updates the cache first and then writes the same data to the database. This guarantees that both layers always contain the same value.
Because the cache is updated immediately, any subsequent read will return the most recent data without needing to check the database. This simplifies read logic and avoids stale data issues.
The tradeoff is that write operations become slower, since each write must be performed twice—once to the cache and once to the database. This increases latency and can reduce throughput under heavy write workloads.
Additionally, write-through caching can consume more resources because both storage layers are actively maintained. Despite this, it is useful in systems where consistency is critical and predictable read behavior is required.
Cache Technologies
Different caching technologies operate at different layers to reduce latency and offload backend systems.
- Redis is a fast in-memory store with TTL support and advanced data structures.
- Memcached is a lightweight cache optimized for high-throughput key-value access.
- CDNs cache content closer to users to reduce global network latency.
Details
Caching is implemented using different technologies depending on where it sits in the system.
Redis is one of the most widely used caching systems. It stores data in memory, making it extremely fast, and supports features like TTL, lists, sets, and other data structures. It is commonly used for application-level caching, sessions, and real-time data.
Memcached is another in-memory cache, but simpler than Redis. It focuses purely on fast key-value storage and is optimized for high throughput, making it effective for basic caching use cases.
At a higher level, CDNs (Content Delivery Networks) cache static content such as images, videos, and HTML at servers distributed around the world. This reduces the distance data must travel, improving performance for global users.
In many systems, these technologies are combined. A request may first hit a CDN, then an application cache like Redis, and only reach the database if needed.
Where Caching Is Used
Caching is applied at multiple layers in a system to reduce latency and minimize load on deeper components.
- Caching exists at the edge (CDN), application layer, and database layer.
- Each layer serves faster responses by preventing unnecessary deeper requests.
- Stacking caches improves performance and scalability across the system.
Details
Caching is not limited to a single place—it is used across multiple layers of a backend system.
At the edge, CDNs cache static content close to users, reducing network latency. At the application layer, tools like Redis or Memcached cache frequently accessed data such as user sessions or API responses.
Some databases also include query caching, storing results of frequent queries to avoid recomputation.
These layers form a hierarchy:
Client → CDN → Application Cache → Database
Each layer absorbs part of the request load, ensuring that the database is only accessed when necessary. This layered approach is critical for scaling systems efficiently.
Cache Tradeoffs
Caching improves performance significantly, but it introduces complexity around data consistency and correctness.
- Caching reduces latency and database load, improving system scalability.
- It introduces risks such as stale data and consistency issues.
- Managing cache invalidation becomes a key challenge in system design.
Details
Caching is one of the most effective ways to improve system performance. By serving data from faster storage, systems can respond more quickly and handle significantly more traffic with less load on the database.
However, this performance gain comes with tradeoffs. The biggest issue is stale data—cached values may no longer reflect the latest state of the database.
Ensuring consistency between the cache and the database introduces additional complexity. Developers must carefully design invalidation strategies to keep data accurate.
In practice, caching is a balance. Systems trade some level of consistency and complexity in exchange for major improvements in speed and scalability.
Question Section
1 / 5
This track is locked
Buy this track once to unlock all of its lessons.