Production Infrastructure

Rate Limiting at the Edge: WAF, CDN, and API Gateway Strategies

How to implement rate limiting before requests reach your application — Cloudflare Rate Limiting, AWS WAF, API gateway throttling, and IP reputation.

Why Rate Limit at the Edge?

Application-level rate limiting (inside your Django app, for example) runs after the request has consumed CPU, memory, a database connection, and network bandwidth. By the time you return a 429, the damage is already done.

Edge rate limiting blocks excess requests before they touch your infrastructure:

Attacker ──→ [Cloudflare Edge] ──429──→ (blocked, never reaches origin)
User     ──→ [Cloudflare Edge] ──→ Origin ──200──→ User

Three reasons to rate limit at the edge:

  • Origin protection — scraper floods and credential stuffing never hit your servers
  • Cost control — LLM API calls, expensive DB queries, and bandwidth costs per request
  • DDoS mitigation — absorb volumetric attacks at CDN capacity, not origin capacity

CDN Rate Limiting

Cloudflare Rate Limiting

Cloudflare rate limiting rules match on URL, IP, or custom fields and return 429 (or a custom page) when the threshold is exceeded:

# Cloudflare Rate Limiting Rule (Terraform)
resource 'cloudflare_rate_limit' 'api_limit' {
  zone_id   = var.zone_id
  threshold = 100          # requests
  period    = 60           # per 60 seconds
  match {
    request {
      url_pattern = "*example.com/api/*"
      schemes     = ["HTTPS"]
      methods     = ["GET", "POST"]
    }
  }
  action {
    mode    = "ban"
    timeout = 300           # ban for 5 minutes
    response {
      content_type = "application/json"
      body         = "{\"error\": \"rate_limit_exceeded\", \"retry_after\": 300}"
    }
  }
}

Cloudflare also supports Advanced Rate Limiting with fingerprinting by cookie, header, ASN, or custom fields — useful for authenticated API endpoints:

# Rate limit by user ID header (authenticated users)
If: Field = Header / Name = X-User-ID / Operator = is present
Rate: 1000 requests per minute per X-User-ID value
Action: Block (429)

AWS CloudFront + WAF

// AWS WAF Rate-Based Rule
{
  "Name": "APIRateLimit",
  "Type": "RATE_BASED",
  "RateKey": "IP",
  "RateLimit": 2000,
  "VisibilityConfig": {
    "SampledRequestsEnabled": true,
    "CloudWatchMetricsEnabled": true,
    "MetricName": "APIRateLimit"
  }
}

AWS WAF rate limiting windows are fixed at 5 minutes. For finer-grained control, use Cloudflare or implement token bucket logic in a Lambda@Edge function.

API Gateway Throttling

AWS API Gateway Usage Plans

import boto3

client = boto3.client('apigateway')

# Create usage plan with throttle + quota
plan = client.create_usage_plan(
    name='StandardTier',
    throttle={
        'burstLimit': 200,   # Concurrent requests (token bucket burst)
        'rateLimit': 100.0   # Requests per second (steady state)
    },
    quota={
        'limit': 10000,
        'period': 'DAY'
    }
)

AWS API Gateway returns 429 Too Many Requests with a x-amzn-RequestId header. The response body is:

{"message": "Too Many Requests"}

Kong Rate Limiting Plugin

# Kong declarative config — rate limit per consumer
plugins:
  - name: rate-limiting
    config:
      minute: 100
      hour: 5000
      policy: redis          # Distributed counter via Redis
      redis_host: redis
      redis_port: 6379
      limit_by: consumer    # Per API key
      fault_tolerant: true  # Allow traffic if Redis is down
      hide_client_headers: false

Envoy Rate Limit Service

# Envoy filter configuration
http_filters:
  - name: envoy.filters.http.ratelimit
    typed_config:
      '@type': type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
      domain: api
      rate_limit_service:
        grpc_service:
          envoy_grpc:
            cluster_name: rate_limit_cluster
      failure_mode_deny: false  # Allow on rate limit service failure

IP Reputation and Bot Detection

Rate limiting by IP is effective against naive bots but not against distributed attacks using residential proxies. Combine with reputation signals:

Cloudflare Managed Challenge

# Cloudflare Firewall Rule: challenge suspicious ASNs
If: (ip.geoip.asnum in {AS396982 AS15169}) and not cf.client.bot
Action: Managed Challenge

Managed Challenge presents a JavaScript challenge or CAPTCHA based on threat score. Legitimate browsers pass silently; bots fail. Unlike traditional CAPTCHAs, it does not interrupt legitimate users.

Bot Score Thresholding

# Block high-confidence bots on sensitive endpoints
If: cf.bot_management.score < 10 and http.request.uri.path contains "/api/login"
Action: Block

# Challenge medium-confidence on public API
If: cf.bot_management.score < 30 and http.request.uri.path contains "/api/"
Action: Managed Challenge

Response Headers for Rate Limiting

When returning a 429, include headers that allow clients to implement respectful retry logic:

HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1709251200
Content-Type: application/json

{
  "error": "rate_limit_exceeded",
  "message": "100 requests per minute allowed. Retry after 60 seconds.",
  "retry_after": 60
}

The IETF draft RFC for standardized rate limit headers (RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset) is gaining adoption — prefer these over the X- prefixed variants in new APIs:

RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 60
RateLimit-Policy: 100;w=60

Application-Level Fallback

Even with edge rate limiting, implement application-level rate limiting as a defense-in-depth layer using Redis:

import time
from django.core.cache import cache
from django.http import JsonResponse

def rate_limit(identifier: str, limit: int, window: int) -> bool:
    """Sliding window rate limiter using Redis.
    Returns True if the request should be allowed.
    """
    key = f'ratelimit:{identifier}'
    now = time.time()
    pipe = cache.client.pipeline()
    pipe.zremrangebyscore(key, 0, now - window)
    pipe.zadd(key, {str(now): now})
    pipe.zcard(key)
    pipe.expire(key, window)
    _, _, count, _ = pipe.execute()
    return count <= limit

Related Protocols

Related Glossary Terms

More in Production Infrastructure