API Design

How to Implement and Handle Rate Limiting (429)

Everything about rate limiting — server-side implementation strategies, client-side handling, and the 429 Too Many Requests response.

Why Rate Limiting Matters

Rate limiting protects your API from abuse, prevents resource exhaustion, and ensures fair access for all clients. Without it, a single misbehaving client can take down your entire service.

The 429 Too Many Requests Response

When a client exceeds the rate limit, the server responds with:

HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1672531260

The Retry-After header tells the client how long to wait (in seconds or as a date).

Server-Side Algorithms

Fixed Window

Count requests per fixed time window (e.g., 100 requests per minute). Simple but allows burst at window boundaries — a client can send 200 requests in 2 seconds by hitting the boundary between two windows.

Sliding Window Log

Store timestamps of each request. Count requests in a rolling window. More accurate but uses more memory (one entry per request).

Sliding Window Counter

Combines fixed window counts with a weighted calculation. Good balance of accuracy and memory usage. Used by most production systems.

Token Bucket

Tokens are added at a fixed rate. Each request consumes one token. Allows controlled bursts (up to the bucket capacity) while enforcing a long-term average rate.

Leaky Bucket

Requests enter a queue that drains at a constant rate. Smooths out bursts entirely. Used when you need a perfectly steady request rate.

Rate Limit Scopes

  • Per API key — Most common for public APIs
  • Per IP address — Fallback when no auth is present
  • Per user — For authenticated endpoints
  • Per endpoint — Different limits for different operations
  • Global — Overall system protection

Client-Side Best Practices

  • Always check for 429 and respect Retry-After
  • Implement exponential backoff — wait 1s, 2s, 4s, 8s...
  • Add jitter — randomize retry timing to avoid thundering herd
  • Track rate limit headers — stop before hitting the limit
  • Queue requests — use a client-side rate limiter to stay within bounds

Implementation with Redis

# Sliding window counter (pseudocode)
def is_rate_limited(key, limit, window_seconds):
    current = redis.get(key) or 0
    if current >= limit:
        return True
    redis.incr(key)
    redis.expire(key, window_seconds)
    return False

Related Protocols

Related Glossary Terms

More in API Design