Why Configure Retries and Timeouts at the Gateway?
Transient failures are inevitable in distributed systems. A backend service may briefly return 503 during a rolling deployment, or a network partition might cause a single request to time out. Configuring retries at the gateway means these transient failures are invisible to clients — the gateway retries automatically and the client sees a successful response.
Timeouts are equally important: without them, a slow upstream service holds the client connection open indefinitely and exhausts the gateway's thread pool. Explicit timeouts bound the worst-case latency a client will experience.
Timeout Configuration
Types of Timeouts
| Timeout | What It Limits |
|---|---|
| Connect timeout | Time to establish TCP connection to upstream |
| Request timeout | Total time from request start to response completion |
| Idle timeout | Time a keep-alive connection can be idle before closing |
| Read timeout | Time between data packets from upstream |
# Kong upstream timeout configuration
services:
- name: orders-service
url: http://orders.internal:8080
connect_timeout: 2000 # 2 seconds to establish connection
write_timeout: 5000 # 5 seconds to send request
read_timeout: 15000 # 15 seconds to receive response
# Envoy route-level timeout (overrides cluster default)
routes:
- match:
prefix: "/api/reports" # slow report generation endpoint
route:
cluster: reports_service
timeout: 60s # longer timeout for this specific route
- match:
prefix: "/api/" # all other routes
route:
cluster: api_service
timeout: 5s # default: 5 second timeout
Setting Timeout Values
Use the p99 latency of the upstream service plus a safety margin:
timeout = p99_latency × 1.5
Example: if orders-service p99 = 200ms, set timeout = 300ms
For report generation p99 = 8s, set timeout = 12s
Different endpoints on the same service often have wildly different latency profiles. Per-route timeout overrides allow you to tune each endpoint appropriately rather than using a single conservative value that protects the slowest endpoint.
Retry Policy
Retryable Status Codes
Only retry on responses that indicate a *transient* server-side failure:
| Code | Retry? | Reason |
|---|---|---|
| 408 Request Timeout | Yes | Server-side timeout, likely transient |
| 429 Too Many Requests | Yes, with backoff | Rate limited, retry after Retry-After |
| 500 Internal Server Error | Maybe | Could be permanent bug — use low retry count |
| 502 Bad Gateway | Yes | Upstream unreachable, likely transient |
| 503 Service Unavailable | Yes | Server overloaded or deploying, transient |
| 504 Gateway Timeout | Yes | Upstream timed out, retry may succeed |
Do not retry 4xx client errors (400, 401, 403, 404) — these indicate a problem with the request itself, not the server, so retrying will always fail.
Max Retries and Backoff
# Envoy retry policy
routes:
- match:
prefix: "/api/"
route:
cluster: api_service
retry_policy:
retry_on: "502,503,504,connect-failure,retriable-4xx"
num_retries: 3
per_try_timeout: 5s # each attempt gets its own timeout
retry_back_off:
base_interval: 100ms # wait 100ms before first retry
max_interval: 1s # cap at 1 second between retries
Exponential backoff with jitter prevents retry storms:
Attempt 1: immediate
Attempt 2: wait base × 2^0 + random(0, base) = ~100-200ms
Attempt 3: wait base × 2^1 + random(0, base) = ~200-400ms
Attempt 4: wait min(base × 2^2, max) + jitter = ~400-600ms
Retry Budgets
A retry budget caps total retry volume as a percentage of non-retry traffic. This prevents a scenario where a failing upstream causes the gateway to generate more traffic than the original request volume:
# Envoy retry budget
retry_policy:
num_retries: 3
retry_host_predicate: # don't retry on the same host
- name: envoy.retry_host_predicates.previous_hosts
host_selection_retry_max_attempts: 3
# Budget: at most 20% of concurrent requests may be retries
retry_priority:
name: envoy.retry_priorities.previous_priorities
Deadline Propagation
When a gateway retries a request, each retry consumes time from the client's overall deadline. Propagating the remaining deadline to upstream services prevents them from doing work that will be discarded because the client already timed out.
gRPC Deadlines
gRPC has first-class deadline support. The client sets a deadline; the gateway passes it to the upstream service; the service can check whether it has time to complete and abandon work if not:
# gRPC Python client — set deadline
import grpc
channel = grpc.insecure_channel('api.example.com')
stub = UserServiceStub(channel)
response = stub.GetUser(
GetUserRequest(user_id='123'),
timeout=5.0 # 5 second deadline
)
// gRPC Go service — check remaining deadline before expensive work
func (s *UserServer) GetUser(ctx context.Context, req *pb.GetUserRequest) (*pb.User, error) {
if deadline, ok := ctx.Deadline(); ok {
remaining := time.Until(deadline)
if remaining < 100*time.Millisecond {
return nil, status.Error(codes.DeadlineExceeded, "insufficient time remaining")
}
}
// ... do actual work
}
HTTP Deadline via Timeout Budget Header
For HTTP services, pass the remaining time budget as a request header:
# Gateway sets remaining timeout budget on outbound request
X-Timeout-Budget: 4200 # milliseconds remaining in client's deadline
Idempotency Awareness
Retrying a non-idempotent request (POST, PATCH) can cause duplicate operations (charging a customer twice, creating two orders). Only retry safe and idempotent methods automatically:
| Method | Safe? | Idempotent? | Auto-retry? |
|---|---|---|---|
| GET | Yes | Yes | Yes |
| HEAD | Yes | Yes | Yes |
| OPTIONS | Yes | Yes | Yes |
| PUT | No | Yes | Yes |
| DELETE | No | Yes | Yes |
| POST | No | No | Only if Idempotency-Key present |
| PATCH | No | No | Only if Idempotency-Key present |
For POST requests with an Idempotency-Key header (Stripe's pattern), the upstream service guarantees idempotency using the key — the gateway may retry:
POST /api/v1/payments
Idempotency-Key: 4a3b2c1d-payment-20240227
Content-Type: application/json
{"amount": 9900, "currency": "USD"}
Summary
Set per-route timeouts based on p99 latency with a safety margin — use shorter timeouts for interactive endpoints and longer ones for async operations. Retry only 502, 503, 504, and connect failures, with exponential backoff and jitter. Use retry budgets to prevent the gateway from amplifying load on a struggling upstream. Propagate deadlines to upstream services so they can abandon work early when the client deadline has passed.