Why Rate Limit at the Edge?
Application-level rate limiting (inside your Django app, for example) runs after the request has consumed CPU, memory, a database connection, and network bandwidth. By the time you return a 429, the damage is already done.
Edge rate limiting blocks excess requests before they touch your infrastructure:
Attacker ──→ [Cloudflare Edge] ──429──→ (blocked, never reaches origin)
User ──→ [Cloudflare Edge] ──→ Origin ──200──→ User
Three reasons to rate limit at the edge:
- Origin protection — scraper floods and credential stuffing never hit your servers
- Cost control — LLM API calls, expensive DB queries, and bandwidth costs per request
- DDoS mitigation — absorb volumetric attacks at CDN capacity, not origin capacity
CDN Rate Limiting
Cloudflare Rate Limiting
Cloudflare rate limiting rules match on URL, IP, or custom fields and return 429 (or a custom page) when the threshold is exceeded:
# Cloudflare Rate Limiting Rule (Terraform)
resource 'cloudflare_rate_limit' 'api_limit' {
zone_id = var.zone_id
threshold = 100 # requests
period = 60 # per 60 seconds
match {
request {
url_pattern = "*example.com/api/*"
schemes = ["HTTPS"]
methods = ["GET", "POST"]
}
}
action {
mode = "ban"
timeout = 300 # ban for 5 minutes
response {
content_type = "application/json"
body = "{\"error\": \"rate_limit_exceeded\", \"retry_after\": 300}"
}
}
}
Cloudflare also supports Advanced Rate Limiting with fingerprinting by cookie, header, ASN, or custom fields — useful for authenticated API endpoints:
# Rate limit by user ID header (authenticated users)
If: Field = Header / Name = X-User-ID / Operator = is present
Rate: 1000 requests per minute per X-User-ID value
Action: Block (429)
AWS CloudFront + WAF
// AWS WAF Rate-Based Rule
{
"Name": "APIRateLimit",
"Type": "RATE_BASED",
"RateKey": "IP",
"RateLimit": 2000,
"VisibilityConfig": {
"SampledRequestsEnabled": true,
"CloudWatchMetricsEnabled": true,
"MetricName": "APIRateLimit"
}
}
AWS WAF rate limiting windows are fixed at 5 minutes. For finer-grained control, use Cloudflare or implement token bucket logic in a Lambda@Edge function.
API Gateway Throttling
AWS API Gateway Usage Plans
import boto3
client = boto3.client('apigateway')
# Create usage plan with throttle + quota
plan = client.create_usage_plan(
name='StandardTier',
throttle={
'burstLimit': 200, # Concurrent requests (token bucket burst)
'rateLimit': 100.0 # Requests per second (steady state)
},
quota={
'limit': 10000,
'period': 'DAY'
}
)
AWS API Gateway returns 429 Too Many Requests with a x-amzn-RequestId header. The response body is:
{"message": "Too Many Requests"}
Kong Rate Limiting Plugin
# Kong declarative config — rate limit per consumer
plugins:
- name: rate-limiting
config:
minute: 100
hour: 5000
policy: redis # Distributed counter via Redis
redis_host: redis
redis_port: 6379
limit_by: consumer # Per API key
fault_tolerant: true # Allow traffic if Redis is down
hide_client_headers: false
Envoy Rate Limit Service
# Envoy filter configuration
http_filters:
- name: envoy.filters.http.ratelimit
typed_config:
'@type': type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
domain: api
rate_limit_service:
grpc_service:
envoy_grpc:
cluster_name: rate_limit_cluster
failure_mode_deny: false # Allow on rate limit service failure
IP Reputation and Bot Detection
Rate limiting by IP is effective against naive bots but not against distributed attacks using residential proxies. Combine with reputation signals:
Cloudflare Managed Challenge
# Cloudflare Firewall Rule: challenge suspicious ASNs
If: (ip.geoip.asnum in {AS396982 AS15169}) and not cf.client.bot
Action: Managed Challenge
Managed Challenge presents a JavaScript challenge or CAPTCHA based on threat score. Legitimate browsers pass silently; bots fail. Unlike traditional CAPTCHAs, it does not interrupt legitimate users.
Bot Score Thresholding
# Block high-confidence bots on sensitive endpoints
If: cf.bot_management.score < 10 and http.request.uri.path contains "/api/login"
Action: Block
# Challenge medium-confidence on public API
If: cf.bot_management.score < 30 and http.request.uri.path contains "/api/"
Action: Managed Challenge
Response Headers for Rate Limiting
When returning a 429, include headers that allow clients to implement respectful retry logic:
HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1709251200
Content-Type: application/json
{
"error": "rate_limit_exceeded",
"message": "100 requests per minute allowed. Retry after 60 seconds.",
"retry_after": 60
}
The IETF draft RFC for standardized rate limit headers (RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset) is gaining adoption — prefer these over the X- prefixed variants in new APIs:
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 60
RateLimit-Policy: 100;w=60
Application-Level Fallback
Even with edge rate limiting, implement application-level rate limiting as a defense-in-depth layer using Redis:
import time
from django.core.cache import cache
from django.http import JsonResponse
def rate_limit(identifier: str, limit: int, window: int) -> bool:
"""Sliding window rate limiter using Redis.
Returns True if the request should be allowed.
"""
key = f'ratelimit:{identifier}'
now = time.time()
pipe = cache.client.pipeline()
pipe.zremrangebyscore(key, 0, now - window)
pipe.zadd(key, {str(now): now})
pipe.zcard(key)
pipe.expire(key, window)
_, _, count, _ = pipe.execute()
return count <= limit