Performance & Optimization

WebSocket Performance Tuning

How to tune WebSocket performance — from frame sizing and per-message compression to ping/pong keepalive, connection scaling, and load balancing long-lived connections.

Connection Lifecycle

A WebSocket connection begins as an HTTP/1.1 request and upgrades via the 101 Switching Protocols response. Once upgraded, the connection is a persistent, full-duplex TCP channel. The lifecycle has three phases:

  • Handshake — HTTP upgrade (one-time cost)
  • Data transfer — bidirectional frame exchange (ongoing)
  • Close — graceful Close frame exchange

Unlike HTTP, the TCP connection stays open indefinitely. This shifts the performance concern from connection establishment (amortized over time) to per-frame efficiency and connection scale.

Frame Size Optimization

WebSocket data is transmitted as frames (RFC 6455 Section 5). Each frame has a header of 2–10 bytes, followed by the payload.

Frame header (2-10 bytes):
  FIN bit | RSV1-3 | Opcode (4 bits)
  MASK bit | Payload length (7 / 7+16 / 7+64 bits)
  [Masking key — 4 bytes, client frames only]
  Payload data

Payload length encoding:

SizeBytes used for length
≤ 125 bytes1 byte
126–65,535 bytes3 bytes (1 + 2)
65,536+ bytes9 bytes (1 + 8)

For high-frequency small messages (trading ticks, sensor readings), batch multiple logical messages into a single frame to reduce per-frame overhead. Aim for frames in the 4–64 KB range for optimal TCP segment utilization.

For large binary payloads (file transfers), fragment into frames under 16 MB to avoid blocking the connection while a single giant frame transmits.

Compression (Per-Message Deflate)

The permessage-deflate WebSocket extension (RFC 7692) compresses each message using DEFLATE, negotiated in the opening handshake:

Client upgrade request:
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits

Server response:
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits=15

Compression benefits: JSON messages typically compress by 60–80%. For a chat app sending 1 KB JSON messages, compression reduces bandwidth by ~700 bytes per message — significant at scale.

Trade-offs:

  • CPU cost per message (compressor state per connection)
  • context_takeover=false resets DEFLATE state per message — lower compression ratio but no per-connection memory cost
  • Disable for already-compressed binary content (images, video)
# Python websockets library — enable compression
async with websockets.serve(
    handler,
    'localhost',
    8765,
    compression='deflate',  # Enable permessage-deflate
) as server:
    await server.serve_forever()

Ping/Pong Tuning

WebSocket defines Ping (opcode 0x9) and Pong (opcode 0xA) control frames for keepalive. The server sends a Ping; the client must respond with a matching Pong.

Why ping/pong matters:

  • NAT/firewall timeouts — idle connections are dropped after 30–300 seconds by intermediate network devices
  • Dead connection detection — a client that silently disconnects (network loss, crash) is detected when Pong is not received

Tuning guidelines:

SettingRecommendation
Ping interval20–30 seconds
Pong timeout10 seconds after Ping
Ping payloadEmpty or minimal (1–4 bytes)

Set ping interval slightly below the shortest NAT timeout in your deployment environment. AWS NAT Gateway drops idle TCP connections after 350 seconds — so 20–30s ping is ample.

Connection Scaling

Each WebSocket connection holds a file descriptor and memory on the server. Linux defaults support ~1,024 file descriptors per process.

# Increase file descriptor limits
ulimit -n 100000

# /etc/sysctl.conf — kernel-level TCP tuning for many connections
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535

Async frameworks (Python asyncio, Node.js, Go goroutines) handle tens of thousands of concurrent WebSocket connections per process. Synchronous thread-per-connection models (Django sync views) are impractical beyond a few hundred connections.

Load Balancing WebSockets

WebSocket connections are long-lived, which creates challenges for standard load balancers:

1. Sticky sessions are required — a WebSocket connection is stateful; it must stay on the same backend for its lifetime. Configure cookie-based or IP-hash affinity at the load balancer.

2. Upgrade passthrough — ensure the LB proxies the Upgrade header:

# Nginx WebSocket proxy
location /ws/ {
    proxy_pass http://websocket_backends;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection 'upgrade';
    proxy_read_timeout 3600s;  # Keep alive for 1 hour
}

3. Adjust timeouts — LB idle timeouts (often 60s) will drop quiet WebSocket connections. Set proxy_read_timeout to exceed your ping interval (e.g., 3600s) so the LB does not prematurely terminate connections during normal idle periods.

البروتوكولات ذات الصلة

مصطلحات ذات صلة

المزيد في Performance & Optimization