API Gateway Retry and Timeout Policies

Why Configure Retries and Timeouts at the Gateway?

Transient failures are inevitable in distributed systems. A backend service may briefly return 503 during a rolling deployment, or a network partition might cause a single request to time out. Configuring retries at the gateway means these transient failures are invisible to clients — the gateway retries automatically and the client sees a successful response.

Timeouts are equally important: without them, a slow upstream service holds the client connection open indefinitely and exhausts the gateway's thread pool. Explicit timeouts bound the worst-case latency a client will experience.

Timeout Configuration

Types of Timeouts

Timeout	What It Limits
Connect timeout	Time to establish TCP connection to upstream
Request timeout	Total time from request start to response completion
Idle timeout	Time a keep-alive connection can be idle before closing
Read timeout	Time between data packets from upstream

# Kong upstream timeout configuration
services:
  - name: orders-service
    url: http://orders.internal:8080
    connect_timeout: 2000    # 2 seconds to establish connection
    write_timeout: 5000     # 5 seconds to send request
    read_timeout: 15000     # 15 seconds to receive response

# Envoy route-level timeout (overrides cluster default)
routes:
  - match:
      prefix: "/api/reports"  # slow report generation endpoint
    route:
      cluster: reports_service
      timeout: 60s   # longer timeout for this specific route
  - match:
      prefix: "/api/"        # all other routes
    route:
      cluster: api_service
      timeout: 5s    # default: 5 second timeout

Setting Timeout Values

Use the p99 latency of the upstream service plus a safety margin:

timeout = p99_latency × 1.5

Example: if orders-service p99 = 200ms, set timeout = 300ms
For report generation p99 = 8s, set timeout = 12s

Different endpoints on the same service often have wildly different latency profiles. Per-route timeout overrides allow you to tune each endpoint appropriately rather than using a single conservative value that protects the slowest endpoint.

Retry Policy

Retryable Status Codes

Only retry on responses that indicate a *transient* server-side failure:

Code	Retry?	Reason
408 Request Timeout	Yes	Server-side timeout, likely transient
429 Too Many Requests	Yes, with backoff	Rate limited, retry after Retry-After
500 Internal Server Error	Maybe	Could be permanent bug — use low retry count
502 Bad Gateway	Yes	Upstream unreachable, likely transient
503 Service Unavailable	Yes	Server overloaded or deploying, transient
504 Gateway Timeout	Yes	Upstream timed out, retry may succeed

Do not retry 4xx client errors (400, 401, 403, 404) — these indicate a problem with the request itself, not the server, so retrying will always fail.

Max Retries and Backoff

# Envoy retry policy
routes:
  - match:
      prefix: "/api/"
    route:
      cluster: api_service
      retry_policy:
        retry_on: "502,503,504,connect-failure,retriable-4xx"
        num_retries: 3
        per_try_timeout: 5s       # each attempt gets its own timeout
        retry_back_off:
          base_interval: 100ms    # wait 100ms before first retry
          max_interval: 1s        # cap at 1 second between retries

Exponential backoff with jitter prevents retry storms:

Attempt 1: immediate
Attempt 2: wait base × 2^0 + random(0, base) = ~100-200ms
Attempt 3: wait base × 2^1 + random(0, base) = ~200-400ms
Attempt 4: wait min(base × 2^2, max) + jitter = ~400-600ms

Retry Budgets

A retry budget caps total retry volume as a percentage of non-retry traffic. This prevents a scenario where a failing upstream causes the gateway to generate more traffic than the original request volume:

# Envoy retry budget
retry_policy:
  num_retries: 3
  retry_host_predicate:     # don't retry on the same host
    - name: envoy.retry_host_predicates.previous_hosts
  host_selection_retry_max_attempts: 3
  # Budget: at most 20% of concurrent requests may be retries
  retry_priority:
    name: envoy.retry_priorities.previous_priorities

Deadline Propagation

When a gateway retries a request, each retry consumes time from the client's overall deadline. Propagating the remaining deadline to upstream services prevents them from doing work that will be discarded because the client already timed out.

gRPC Deadlines

gRPC has first-class deadline support. The client sets a deadline; the gateway passes it to the upstream service; the service can check whether it has time to complete and abandon work if not:

# gRPC Python client — set deadline
import grpc

channel = grpc.insecure_channel('api.example.com')
stub = UserServiceStub(channel)
response = stub.GetUser(
    GetUserRequest(user_id='123'),
    timeout=5.0  # 5 second deadline
)

// gRPC Go service — check remaining deadline before expensive work
func (s *UserServer) GetUser(ctx context.Context, req *pb.GetUserRequest) (*pb.User, error) {
    if deadline, ok := ctx.Deadline(); ok {
        remaining := time.Until(deadline)
        if remaining < 100*time.Millisecond {
            return nil, status.Error(codes.DeadlineExceeded, "insufficient time remaining")
        }
    }
    // ... do actual work
}

HTTP Deadline via Timeout Budget Header

For HTTP services, pass the remaining time budget as a request header:

# Gateway sets remaining timeout budget on outbound request
X-Timeout-Budget: 4200  # milliseconds remaining in client's deadline

Idempotency Awareness

Retrying a non-idempotent request (POST, PATCH) can cause duplicate operations (charging a customer twice, creating two orders). Only retry safe and idempotent methods automatically:

Method	Safe?	Idempotent?	Auto-retry?
GET	Yes	Yes	Yes
HEAD	Yes	Yes	Yes
OPTIONS	Yes	Yes	Yes
PUT	No	Yes	Yes
DELETE	No	Yes	Yes
POST	No	No	Only if Idempotency-Key present
PATCH	No	No	Only if Idempotency-Key present

For POST requests with an Idempotency-Key header (Stripe's pattern), the upstream service guarantees idempotency using the key — the gateway may retry:

POST /api/v1/payments
Idempotency-Key: 4a3b2c1d-payment-20240227
Content-Type: application/json

{"amount": 9900, "currency": "USD"}

Summary

Set per-route timeouts based on p99 latency with a safety margin — use shorter timeouts for interactive endpoints and longer ones for async operations. Retry only 502, 503, 504, and connect failures, with exponential backoff and jitter. Use retry budgets to prevent the gateway from amplifying load on a struggling upstream. Propagate deadlines to upstream services so they can abandon work early when the client deadline has passed.