TCP Connection Overhead
Every TCP connection requires a three-way handshake before the first byte of HTTP data can be sent:
Client → Server: SYN
Server → Client: SYN-ACK
Client → Server: ACK
Client → Server: GET /api/... (first HTTP request)
This handshake adds one full round-trip time (RTT) — typically 20–100 ms for cross-region requests. With TLS, add another 1–2 RTTs. For an API making dozens of requests, the connection overhead can easily exceed the time spent on actual data transfer.
HTTP/1.0 opened a new TCP connection for every request and immediately closed it. HTTP/1.1 fixed this with keep-alive.
HTTP/1.1 Keep-Alive
Keep-alive (also called persistent connections) allows multiple HTTP requests to reuse a single TCP connection, amortizing the handshake cost across many requests.
In HTTP/1.1, keep-alive is on by default. Connections close only when explicitly signaled:
Connection: close
Or when the server's timeout expires. Configure server-side timeouts to balance resource usage:
# Nginx keep-alive settings
keepalive_timeout 65s; # Max idle time before closing
keepalive_requests 1000; # Max requests per connection
The Head-of-Line Blocking Problem
HTTP/1.1 keep-alive has a critical limitation: requests on a single connection are serialized. Request 2 cannot be sent until Response 1 is fully received. This is head-of-line (HoL) blocking.
Browsers work around this by opening 6 parallel TCP connections per origin — but this multiplies connection overhead.
HTTP/2 Multiplexing
HTTP/2 solves HoL blocking with stream multiplexing: multiple requests and responses are interleaved on a single TCP connection as independent streams. Request 2 does not wait for Response 1.
Single TCP connection:
Stream 1: GET /api/user → 200 OK {user data}
Stream 3: GET /api/orders → 200 OK {order list}
Stream 5: GET /api/prefs → 200 OK {preferences}
(all in-flight simultaneously)
With HTTP/2, the browser needs only 1 connection per origin. This also reduces TLS handshake overhead significantly.
Stream priority lets clients signal which responses are most urgent (e.g., blocking render vs. prefetch), though browser implementations vary.
Connection Pooling in Clients
HTTP clients should maintain a connection pool — a set of open, reusable connections — rather than opening a new connection for every request. Most HTTP libraries do this automatically but require tuning:
# Python httpx — connection pool configuration
import httpx
transport = httpx.HTTPTransport(
limits=httpx.Limits(
max_connections=100, # Total pool size
max_keepalive_connections=20, # Idle connections to keep
keepalive_expiry=30, # Seconds before idle conn closed
)
)
client = httpx.Client(transport=transport)
Key settings to tune:
| Setting | Guidance |
|---|---|
| `max_connections` | Set to concurrent request parallelism |
| `max_keepalive_connections` | ~20% of max_connections |
| `keepalive_expiry` | Slightly less than server's `keepalive_timeout` |
If client keepalive_expiry exceeds the server's timeout, the server may close the connection mid-request, causing intermittent connection reset errors.
Connection Draining for Graceful Shutdown
When taking a server out of rotation (deploy, scale-in), abruptly closing keep-alive connections drops in-flight requests. Connection draining allows existing requests to complete before the connection closes:
- Load balancer stops sending new connections to the instance
- Server stops accepting new connections (
Connection: closeon responses) - In-flight requests complete normally
- Server shuts down after a drain timeout (typically 30–60 seconds)
# Django / Gunicorn: SIGTERM triggers graceful shutdown
# gunicorn --graceful-timeout 30 ...
# Gunicorn stops accepting new requests and waits up to 30s
# for in-flight requests to complete before SIGKILL
Measuring Connection Reuse
Use curl --verbose to inspect connection reuse:
# First request: new connection
curl -v https://api.example.com/data 2>&1 | grep 'Connected to'
# * Connected to api.example.com (93.184.216.34) port 443
# With --keepalive: reused connection shows 'Re-using existing connection'
curl -v --keepalive https://api.example.com/data https://api.example.com/more
In Nginx access logs, monitor the $connections_active and $connections_waiting variables to understand pool utilization. High waiting count with low active indicates healthy keep-alive reuse.