Concurrency and Async Processing

Why Concurrency Matters in Backend Systems

Modern backend systems must handle many incoming requests at the same time; concurrency is what makes this possible.

🐢 Sequence Bottleneck

TIME →

REQ 1

CPU

REQ 2

REQ 3

CPU Compute

Network/Disk Wait

Database/Logic

Servers receive requests continuously from multiple users and systems.
Processing one request at a time would create severe delays and bottlenecks.
Concurrency allows systems to make progress on multiple requests without waiting for each one to fully complete.

Details

In real systems, requests arrive continuously and often in large volumes, not one at a time. A backend service may receive hundreds or thousands of requests per second, each requiring computation, database access, or external API calls.

If the server processed these requests strictly in sequence, every new request would have to wait for the previous one to finish. This creates a queue that grows quickly and leads to unacceptable latency for users.

The problem becomes worse because many operations involve waiting, such as network calls or disk access. During this waiting time, the CPU is idle, which is an inefficient use of resources.

Concurrency addresses this by allowing the system to overlap work. While one request is waiting on I/O, another request can be processed. This keeps the system active and improves overall throughput.

The result is a backend system that can handle many users at once while maintaining responsiveness, which is a fundamental requirement for modern applications.

Concurrency vs Parallelism

Concurrency is about managing multiple tasks in progress, while parallelism is about executing multiple tasks at the same time.

Concurrency

Task A

Task B

Task C

one worker switches tasks

Parallelism

Core 1

Core 2

Core 3

multiple workers run together

🧑‍🍳 one chef juggling dishes

⚙️ vs

👨‍🍳👨‍🍳👨‍🍳 many chefs cooking at once

Concurrency allows tasks to make progress by switching between them.
Parallelism uses multiple CPU cores to run tasks simultaneously.

Details

Concurrency focuses on how a system structures and manages multiple tasks. A server may start processing one request, pause it while waiting for I/O, and then switch to another request. Tasks take turns progressing, which keeps the system efficient even with limited resources.

Parallelism, in contrast, depends on hardware. If a machine has multiple CPU cores, it can execute multiple tasks at the exact same time. Each core runs its own task independently, increasing total processing capacity.

The key distinction is that concurrency is about coordination and scheduling, while parallelism is about simultaneous execution. They solve different problems but are often used together.

For example, a backend server might manage thousands of concurrent requests using asynchronous techniques, while also distributing work across multiple CPU cores to achieve parallel execution.

Understanding this difference is important when reasoning about system performance, scalability, and how different backend frameworks operate under load.

Threads

A thread is a unit of execution within a process, and traditional backend servers use multiple threads to handle requests concurrently.

Thread 1

Thread 2

Thread 3

Thread 4

🧵 Independent flow

⚡ Parallel handling

🔁 One request per thread

Each lane runs its own execution path. More threads mean more simultaneous work — until CPU or memory becomes the limit.

Each thread represents an independent flow of execution within the server.
Multiple threads allow the server to handle multiple requests at the same time.
Thread-based models are widely used in backend frameworks like Java Spring Boot.

Details

A process is a running instance of a program, and within that process, threads are the units that actually perform work. A single process can contain multiple threads, each executing independently while sharing the same memory space.

In traditional backend systems, incoming requests are assigned to threads. For example, when a request arrives, the server may allocate a thread from a thread pool to handle it. That thread processes the request from start to finish, including business logic, database queries, and response generation.

This model is straightforward and easy to reason about. Each request has its own thread, so execution flows are isolated, and developers can write code in a mostly sequential style.

However, threads are not free. Each thread consumes memory and adds overhead to the system. As the number of concurrent requests grows, creating too many threads can lead to performance degradation due to context switching and resource exhaustion.

Because of these limitations, modern systems often combine thread-based approaches with asynchronous techniques, but understanding threads remains essential since many production systems still rely on this model.

Blocking vs Non-Blocking Operations

Blocking operations make a thread wait until a task completes, while non-blocking operations allow the thread to continue doing other work.

⏸ Execution halted

⚡ Other work continues

When a task waits on I/O, a blocking model stalls the worker. Non-blocking systems keep the lane active while waiting.

Blocking operations pause execution until the result is ready.
Non-blocking operations allow the system to continue processing other tasks.

Details

In a blocking operation, a thread starts a task and then waits until that task is fully completed before moving on. For example, when a thread sends a database query, it may sit idle until the database responds. During this time, the thread cannot perform any other useful work.

This becomes a major limitation in systems handling many requests. If each thread spends a significant amount of time waiting on I/O, the server needs more threads to maintain throughput, which increases resource usage and overhead.

Non-blocking operations take a different approach. Instead of waiting, the thread initiates the task and immediately continues executing other work. When the result is ready, the system is notified, and the original task can resume.

This allows a single thread to manage multiple tasks efficiently, especially in I/O-heavy workloads where waiting time dominates execution time.

Understanding the difference between blocking and non-blocking behavior is critical because it directly influences how backend systems are designed for performance, responsiveness, and scalability.

Async I/O

Async I/O allows servers to continue processing other tasks while waiting for slow operations like database queries or network calls.

🧱 Blocking

Request A

⚙️

Request B

⚙️

Request C

⏳ idle

thread blocked during I/O

⚡ Async

Request A

⚙️

Request B

⚙️

Request C

🔄 other work

work continues during wait

active

idle

async work

async avoids idle time → higher throughput

Many backend operations involve waiting on external systems like databases or APIs.
Async I/O prevents threads from sitting idle during these waits.
This approach improves throughput and resource efficiency.

Details

In backend systems, many operations are not CPU-bound but I/O-bound. Tasks such as querying a database, calling an external API, or reading from disk can take significantly longer than in-memory computation.

If these operations are handled in a blocking way, the thread remains idle while waiting for the result, which wastes system resources and limits how many requests the server can handle.

Async I/O solves this by allowing the server to initiate an operation and then move on to other work instead of waiting. The system keeps track of the pending operation and resumes processing once the result is available.

This enables a single thread or a small number of threads to manage many concurrent requests efficiently, making async I/O a core technique in high-performance backend systems.

Event Loop Model

The event loop model uses a small number of threads to handle many requests by processing tasks asynchronously instead of dedicating a thread per request.

THREAD

ASYNC

HYBRID

🧵 T0

🧵 T1

🧵 T2

🧵 T3

🧵 T4

🧵 T5

Each request occupies its own worker.

A single event loop continuously processes tasks from a queue.
Non-blocking operations allow the system to avoid waiting.
Callbacks are executed when asynchronous operations complete.

Details

In the event loop model, the system does not assign a separate thread to each request. Instead, a central loop continuously checks for new tasks and executes them one at a time. When a request arrives, the event loop begins processing it and initiates any required I/O operations in a non-blocking way.

Rather than waiting for these operations to complete, the event loop moves on to handle other incoming requests. This keeps the system busy and avoids wasting time on idle waiting, which is common in thread-based models.

When an asynchronous operation finishes, its result is placed into a callback queue. The event loop eventually picks up this callback and completes the remaining work for that request.

A well-known example of this model is Node.js. It uses a single-threaded event loop combined with async I/O to handle thousands of concurrent connections efficiently, relying on callbacks and promises to manage execution flow.

This approach is highly efficient for I/O-heavy workloads, but it requires careful design to avoid blocking the event loop, since a single slow task can delay all others.

Race Conditions

A race condition occurs when multiple threads access and modify shared data at the same time, leading to unpredictable and incorrect results.

Shared Value: 0

single shared variable

Thread A

READ -

+1 → -

WRITE

Thread B

READ -

+1 → -

WRITE

⚠️ Race Condition

Resetting shared state

Race conditions happen when multiple threads operate on shared data without proper coordination.
The final result depends on the timing and order of execution.
They are a major source of bugs in concurrent systems.

Details

A race condition arises when two or more threads read and update the same piece of data at the same time. Because these operations are not synchronized, the outcome depends on which thread executes first, making the system behavior unpredictable.

For example, if two threads read the same account balance and both attempt to update it, one update can overwrite the other. This results in incorrect data, even though each individual operation appears correct in isolation.

These issues are difficult to detect because they may not occur consistently. Small timing differences can change the execution order, causing bugs that are hard to reproduce and debug.

To prevent race conditions, systems use synchronization techniques such as locks, atomic operations, and database transactions. These mechanisms ensure that only one thread can modify shared data at a time or that operations are executed safely without interference.

Question Section

Try to answer in your own words first, then flip the card to check.

1 / 5

Why Concurrency Matters in Backend Systems

Modern backend systems must handle many incoming requests at the same time; concurrency is what makes this possible.

🐢 Sequence Bottleneck

TIME →

REQ 1

CPU

REQ 2

REQ 3

CPU Compute

Network/Disk Wait

Database/Logic

Servers receive requests continuously from multiple users and systems.
Processing one request at a time would create severe delays and bottlenecks.
Concurrency allows systems to make progress on multiple requests without waiting for each one to fully complete.

Details

The problem becomes worse because many operations involve waiting, such as network calls or disk access. During this waiting time, the CPU is idle, which is an inefficient use of resources.

Concurrency addresses this by allowing the system to overlap work. While one request is waiting on I/O, another request can be processed. This keeps the system active and improves overall throughput.

The result is a backend system that can handle many users at once while maintaining responsiveness, which is a fundamental requirement for modern applications.

Concurrency vs Parallelism

Concurrency is about managing multiple tasks in progress, while parallelism is about executing multiple tasks at the same time.

Concurrency

Task A

Task B

Task C

one worker switches tasks

Parallelism

Core 1

Core 2

Core 3

multiple workers run together

🧑‍🍳 one chef juggling dishes

⚙️ vs

👨‍🍳👨‍🍳👨‍🍳 many chefs cooking at once

Concurrency allows tasks to make progress by switching between them.
Parallelism uses multiple CPU cores to run tasks simultaneously.

Details

The key distinction is that concurrency is about coordination and scheduling, while parallelism is about simultaneous execution. They solve different problems but are often used together.

For example, a backend server might manage thousands of concurrent requests using asynchronous techniques, while also distributing work across multiple CPU cores to achieve parallel execution.

Understanding this difference is important when reasoning about system performance, scalability, and how different backend frameworks operate under load.

Threads

A thread is a unit of execution within a process, and traditional backend servers use multiple threads to handle requests concurrently.

Thread 1

Thread 2

Thread 3

Thread 4

🧵 Independent flow

⚡ Parallel handling

🔁 One request per thread

Each lane runs its own execution path. More threads mean more simultaneous work — until CPU or memory becomes the limit.

Each thread represents an independent flow of execution within the server.
Multiple threads allow the server to handle multiple requests at the same time.
Thread-based models are widely used in backend frameworks like Java Spring Boot.

Details

This model is straightforward and easy to reason about. Each request has its own thread, so execution flows are isolated, and developers can write code in a mostly sequential style.

Blocking vs Non-Blocking Operations

Blocking operations make a thread wait until a task completes, while non-blocking operations allow the thread to continue doing other work.

⏸ Execution halted

⚡ Other work continues

When a task waits on I/O, a blocking model stalls the worker. Non-blocking systems keep the lane active while waiting.

Blocking operations pause execution until the result is ready.
Non-blocking operations allow the system to continue processing other tasks.

Details

This allows a single thread to manage multiple tasks efficiently, especially in I/O-heavy workloads where waiting time dominates execution time.

Understanding the difference between blocking and non-blocking behavior is critical because it directly influences how backend systems are designed for performance, responsiveness, and scalability.

Async I/O

Async I/O allows servers to continue processing other tasks while waiting for slow operations like database queries or network calls.

🧱 Blocking

Request A

⚙️

Request B

⚙️

Request C

⏳ idle

thread blocked during I/O

⚡ Async

Request A

⚙️

Request B

⚙️

Request C

🔄 other work

work continues during wait

active

idle

async work

async avoids idle time → higher throughput

Many backend operations involve waiting on external systems like databases or APIs.
Async I/O prevents threads from sitting idle during these waits.
This approach improves throughput and resource efficiency.

Details

If these operations are handled in a blocking way, the thread remains idle while waiting for the result, which wastes system resources and limits how many requests the server can handle.

This enables a single thread or a small number of threads to manage many concurrent requests efficiently, making async I/O a core technique in high-performance backend systems.

Event Loop Model

The event loop model uses a small number of threads to handle many requests by processing tasks asynchronously instead of dedicating a thread per request.

THREAD

ASYNC

HYBRID

🧵 T0

🧵 T1

🧵 T2

🧵 T3

🧵 T4

🧵 T5

Each request occupies its own worker.

A single event loop continuously processes tasks from a queue.
Non-blocking operations allow the system to avoid waiting.
Callbacks are executed when asynchronous operations complete.

Details

When an asynchronous operation finishes, its result is placed into a callback queue. The event loop eventually picks up this callback and completes the remaining work for that request.

This approach is highly efficient for I/O-heavy workloads, but it requires careful design to avoid blocking the event loop, since a single slow task can delay all others.

Race Conditions

A race condition occurs when multiple threads access and modify shared data at the same time, leading to unpredictable and incorrect results.

Shared Value: 0

single shared variable

Thread A

READ -

+1 → -

WRITE

Thread B

READ -

+1 → -

WRITE

⚠️ Race Condition

Resetting shared state

Race conditions happen when multiple threads operate on shared data without proper coordination.
The final result depends on the timing and order of execution.
They are a major source of bugs in concurrent systems.

Details

These issues are difficult to detect because they may not occur consistently. Small timing differences can change the execution order, causing bugs that are hard to reproduce and debug.

Question Section

Try to answer in your own words first, then flip the card to check.

1 / 5

Concurrency and Async Processing

Why Concurrency Matters in Backend Systems

Concurrency vs Parallelism

Threads

Blocking vs Non-Blocking Operations

Async I/O

Event Loop Model

Race Conditions

Question Section

Related lessons

Cookie Consent

Concurrency and Async Processing

Why Concurrency Matters in Backend Systems

Concurrency vs Parallelism

Threads

Blocking vs Non-Blocking Operations

Async I/O

Event Loop Model

Race Conditions

Question Section

Related lessons