Security & Authentication

Rate Limiting as a Security Control: Brute Force, DDoS, and Abuse Prevention

Using HTTP 429 and rate limiting to defend against credential stuffing, API abuse, and application-layer DDoS — beyond simple throttling.

Rate Limiting vs Performance Throttling

There are two reasons to rate limit:

  • Performance: protect your infrastructure from overload — limit each client to N requests per second so no single user can saturate your servers.
  • Security: detect and block abusive behavior — credential stuffing, account enumeration, scraping, and application-layer DDoS.

Most rate limiting guides focus on the first. This guide focuses on the second, because security rate limiting requires different strategies and tighter controls.

Attack Patterns That Rate Limiting Defends Against

Credential Stuffing

Attackers take username/password pairs leaked from other breaches and try them against your login endpoint. With modern botnets, this can mean thousands of attempts per minute across thousands of IPs.

A login endpoint without rate limiting is wide open to credential stuffing. Even if your password hashing is correct, a user whose password was leaked elsewhere will be compromised.

Defense: Rate limit login attempts per account (not per IP) — an attacker can rotate IPs, but cannot change which account they're targeting:

# Per-account rate limit (Redis-backed)
import redis
r = redis.Redis()

def check_login_rate_limit(username: str) -> bool:
    key = f'login_attempts:{username}'
    attempts = r.incr(key)
    if attempts == 1:
        r.expire(key, 900)  # 15-minute window
    return attempts <= 5  # Allow 5 attempts per 15 minutes

After 5 failures, return a temporary lockout. Log the event and alert on accounts with sustained attack patterns.

Account Enumeration

Your password reset and login endpoints may leak whether an account exists. "Email not found" vs "Wrong password" tells an attacker which emails are registered. Rate limiting slows enumeration, but you should also unify error messages: always return "If an account with that email exists, you'll receive a reset link."

API Abuse and Scraping

Bots scrape your public API to build competing products or extract pricing data. Unlike DDoS, scrapers try to stay under the radar — they look like legitimate traffic but in much higher volume.

Behavioral signals that distinguish scrapers from real users:

  • No browser fingerprint (no JS execution, no cookie storage)
  • Sequential access patterns (page 1, 2, 3, 4 ... in perfect order)
  • No referrer header
  • Unusually consistent timing between requests
  • High volume from datacenter IP ranges

Rate Limiting Algorithms

Fixed Window Counter

The simplest approach. Count requests per time window (e.g., per minute).

def fixed_window(client_id: str, limit: int, window_seconds: int) -> bool:
    key = f'ratelimit:{client_id}:{int(time.time() // window_seconds)}'
    count = r.incr(key)
    if count == 1:
        r.expire(key, window_seconds)
    return count <= limit

Weakness: A burst at the window boundary (59 requests in second 59, then 60 more in second 1 of the next window) allows 2x the intended rate.

Sliding Window Log

Record a timestamp for each request. Count requests in the past N seconds. More accurate but memory-intensive for high-traffic endpoints.

Token Bucket

A bucket holds up to N tokens. Each request consumes one token. Tokens refill at a steady rate. Allows bursts up to the bucket capacity while enforcing an average rate.

Token bucket is well-suited for APIs that need to handle short bursts (a user submitting a batch of requests) while still limiting sustained throughput.

Distributed Rate Limiting with Redis

For horizontally-scaled backends, rate limit state must live in a shared store:

# Sliding window counter using Redis sorted sets
import time

def sliding_window_rate_limit(
    client_id: str,
    limit: int,
    window_seconds: int,
) -> tuple[bool, int]:
    now = time.time()
    window_start = now - window_seconds
    key = f'ratelimit:{client_id}'

    pipe = r.pipeline()
    pipe.zremrangebyscore(key, 0, window_start)   # Remove old entries
    pipe.zadd(key, {str(now): now})               # Add current request
    pipe.zcard(key)                               # Count in window
    pipe.expire(key, window_seconds)
    _, _, count, _ = pipe.execute()

    allowed = count <= limit
    return allowed, count

Response Design for Rate-Limited Requests

A well-designed rate limit response helps legitimate clients back off gracefully:

HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1740000000
Content-Type: application/json

{"error": "rate_limit_exceeded", "retry_after": 60}
HeaderMeaning
`Retry-After`Seconds (or HTTP date) until the client may retry
`X-RateLimit-Limit`Max requests in the window
`X-RateLimit-Remaining`Requests remaining in current window
`X-RateLimit-Reset`Unix timestamp when the window resets

The RateLimit-* header names (without X-) are being standardized in draft RFC 9211. The Retry-After header is defined in RFC 9110 and understood by all HTTP clients.

Evasion and Countermeasures

IP Rotation

Sophisticated attackers rotate source IPs through proxy pools (residential proxies look like real users). Defenses:

  • Rate limit by account and device fingerprint, not just IP
  • Flag multiple accounts from the same device fingerprint
  • Use reputation databases (Cloudflare, AWS WAF) that track known bot IPs
  • Challenge suspicious IPs with CAPTCHA rather than outright blocking

CAPTCHA Challenges

When a client hits the rate limit for a security-sensitive endpoint, respond with a CAPTCHA challenge instead of a hard 429. Legitimate users can solve it and continue. Bots cannot. This provides a better user experience while stopping automated abuse.

WAF Integration

Web Application Firewalls (Cloudflare WAF, AWS WAF, ModSecurity) can rate limit at the network layer before requests reach your application. This is essential for application-layer DDoS — your application may not even be able to respond to health checks if it's busy processing a flood of rate-limited requests.

Configure WAF rate limiting rules for your highest-risk endpoints:

  • /login, /auth/token — credential stuffing targets
  • /api/search, /api/products — scraping targets
  • /password-reset — account enumeration target
  • /checkout, /order — fraud targets

Related Protocols

Related Glossary Terms

More in Security & Authentication