Engineering Note
Architecture

Designing Systems That Hold Under Load

Resilience Engineering for Real Traffic

8 min read
AdvancedArchitecture

Introduction

Most systems don’t break because of a single large spike. They fail when multiple small inefficiencies combine under pressure. What works at low traffic often collapses under load due to unbounded requests, tight coupling, and lack of failure handling.

Designing systems that hold under load is less about scaling up infrastructure and more about controlling how work flows through the system.

The Problem

Under load, systems often show predictable failure patterns:

  • Requests pile up faster than they can be processed
  • Downstream services become bottlenecks
  • Retries amplify traffic instead of stabilizing it
  • Failures cascade across services

The core issue is lack of control over concurrency and resource usage.

System Design / Approach

A resilient system under load follows a few key principles:

  • Limit incoming traffic using rate limiting
  • Buffer work using queues
  • Fail fast instead of blocking resources
  • Degrade gracefully when capacity is exceeded

The goal is not to eliminate load, but to shape it into something the system can handle.

Implementation

Step 1: Rate Limiting

Limit how many requests a user or client can make in a given time window.


if (requests > limit) {
  return res.status(429).json({ error: "Too many requests" });
}

This prevents sudden spikes from overwhelming the system.

Step 2: Queue-Based Processing

Offload heavy tasks to background workers using a queue.


await queue.add("process-task", data);

Queues smooth out traffic and prevent synchronous bottlenecks.

Step 3: Backpressure

Reject or delay requests when the system is overloaded.


if (queue.length > threshold) {
  return res.status(503).json({ error: "System busy" });
}

Backpressure prevents the system from collapsing under excessive load.

Step 4: Graceful Degradation

Disable non-critical features during high load.


if (isHighLoad) {
  return minimalResponse();
}

This ensures core functionality remains available.

Trade-offs

Technique Benefit Cost
Rate limiting Protects system stability May reject valid requests
Queues Smooths traffic spikes Adds latency
Backpressure Prevents overload User-facing errors

Real-World Impact

  • Stable performance during traffic spikes
  • Reduced system crashes
  • Predictable latency under load

Key Takeaways

Systems fail under load due to coordination issues, not just raw traffic

Backpressure is essential to prevent cascading failures across services

Queues decouple request handling from heavy processing and improve resilience

Graceful degradation keeps core functionality available during overload

Rate limiting protects both the system and downstream dependencies

Future Improvements

Introduce adaptive rate limiting based on real-time system metrics

Add circuit breakers for external service dependencies

Implement distributed tracing for better observability

Use autoscaling strategies tied to queue depth and latency

Introduce load testing pipelines to simulate real-world traffic patterns

Designing Systems That Hold Under Load | Tushar Kanti Dey