Performance & Optimization

Load Balancing Strategies for APIs

A practical guide to load balancing algorithms, Layer 4 vs Layer 7, health checks, session affinity, and how status codes 502, 503, and 504 relate to load balancer behavior.

Why Load Balance?

A single server has finite CPU, memory, and network capacity. Load balancing distributes incoming requests across a pool of backend servers ("upstream" or "origin" servers) to:

  • Scale horizontally — add servers instead of upgrading hardware
  • Provide high availability — remove failed servers without downtime
  • Improve geographic performance — route to nearest data center
  • Enable zero-downtime deploys — drain one server at a time

Layer 4 vs Layer 7 Load Balancing

Load balancers operate at different layers of the network stack:

AttributeLayer 4 (Transport)Layer 7 (Application)
SeesTCP/UDP packetsHTTP headers, URLs, body
Routing basisIP + portPath, header, cookie, body
TLSPassthrough or terminateAlways terminate
PerformanceExtremely fastSlightly slower (parse overhead)
Use caseTCP proxying, gamingHTTP APIs, microservices
ExamplesAWS NLB, HAProxy TCPAWS ALB, Nginx, Envoy

For HTTP APIs, Layer 7 is almost always the right choice — it enables path-based routing, header inspection, and meaningful health checks.

Algorithm Comparison

Round Robin

Requests are distributed to servers in sequence: 1 → 2 → 3 → 1 → 2 → 3…

  • Pros: Simple, no state required, predictable distribution
  • Cons: Ignores server load — a slow server receives the same request rate as a fast one
  • Best for: Homogeneous servers with similar capacity and request cost

Least Connections

New requests go to the server with the fewest active connections.

  • Pros: Naturally routes away from overloaded servers
  • Cons: Does not account for connection duration variance
  • Best for: Long-lived connections (WebSocket, gRPC streaming)

Weighted Round Robin / Weighted Least Connections

Servers are assigned weights proportional to capacity. A server with weight 2 receives twice the traffic of one with weight 1.

  • Use case: Heterogeneous fleets (e.g., c5.2xlarge + c5.4xlarge mix)

IP Hash

The client's IP address is hashed to consistently select the same server.

  • Pros: Simple session affinity without cookies
  • Cons: Poor distribution when many clients share a NAT IP; server removal disrupts a fraction of users

Consistent Hashing

A hash ring distributes keys across servers such that only 1/N of keys remaps when a server is added or removed. Used extensively in distributed caches (Redis Cluster, Memcached).

  • Best for: Stateful caches where locality matters

Health Check Integration

Load balancers continuously probe backends to detect failures. Unhealthy backends are removed from rotation automatically.

upstream api_servers {
    server backend1:8000;
    server backend2:8000;
    keepalive 32;
}

For active health checks (Nginx Plus, HAProxy, Envoy):

# Envoy health check
health_checks:
  - timeout: 1s
    interval: 5s
    healthy_threshold: 1
    unhealthy_threshold: 3
    http_health_check:
      path: /health
      expected_statuses: [200]

Design your /health endpoint to return 200 only when the instance is fully ready (migrations complete, warm-up done, dependencies reachable).

Session Affinity (Sticky Sessions)

When server-side state is not externalized (e.g., WebSocket connections, in-memory sessions), you need requests from the same client to always reach the same backend:

upstream api_servers {
    ip_hash;  # Route by client IP
    server backend1:8000;
    server backend2:8000;
}

Cookie-based affinity (AWS ALB's AWSALB cookie) is more precise than IP hash and handles NAT correctly. Prefer externalizing state to Redis or a database to avoid needing affinity altogether.

LB Status Codes: 502, 503, 504

These three codes are generated by the load balancer, not your application:

CodeMeaningCommon Cause
**502 Bad Gateway**Backend returned an invalid responseApp crashed, returned non-HTTP
**503 Service Unavailable**No healthy backends availableAll servers marked unhealthy
**504 Gateway Timeout**Backend took too long to respondSlow query, deadlock, overload

When you see a flood of 503s, check your health check endpoint and upstream health. 504s often point to database contention or a blocking external API call. Tune LB timeout to match your p99 response time budget.

Связанные протоколы

Связанные термины глоссария

Больше в Performance & Optimization