Protocol Deep Dives

RFC 6455: WebSocket Protocol Deep Dive

A technical walkthrough of RFC 6455 — the WebSocket protocol standard — covering the opening handshake, frame format, control frames, masking, and secure closure.

Why WebSocket?

Before WebSocket (RFC 6455, 2011), real-time web applications were built on hacks: long-polling (HTTP requests held open until data arrives) or Server-Sent Events (one-directional server push). Both add latency and overhead.

WebSocket provides a single, persistent, full-duplex channel over TCP, initiated via HTTP. Once the connection is established:

  • Either party can send data at any time
  • No HTTP headers on each message (framing overhead is 2–10 bytes)
  • Latency is bounded only by network RTT

The Opening Handshake

WebSocket starts as an HTTP/1.1 request with an Upgrade header. The server validates the request and responds with 101 Switching Protocols, after which the TCP connection is handed off to the WebSocket protocol.

Client request:

GET /chat HTTP/1.1
Host: example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Sec-WebSocket-Protocol: chat, superchat
Sec-WebSocket-Extensions: permessage-deflate

Server response:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Sec-WebSocket-Protocol: chat

Sec-WebSocket-Accept calculation (RFC 6455 Section 1.3):

import base64, hashlib
MAGIC = '258EAFA5-E914-47DA-95CA-C5AB0DC85B11'
key = 'dGhlIHNhbXBsZSBub25jZQ=='
accept = base64.b64encode(
    hashlib.sha1((key + MAGIC).encode()).digest()
).decode()
# 's3pPLMBiTxaQ9kYGzzhZRbK+xOo='

This handshake prevents HTTP servers from accidentally treating a WebSocket connection as a regular HTTP request — the Accept header cannot be generated without knowing the WebSocket magic string.

Frame Format

After the handshake, all data is sent as WebSocket frames:

Bit:  0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-------+-+-------------+-------------------------------+
     |F|R|R|R| opcode|M| Payload len |    Extended payload length    |
     |I|S|S|S|  (4)  |A|     (7)     |           (if 16 or 64 bit)  |
     |N|V|V|V|       |S|             |                               |
     | |1|2|3|       |K|             |                               |
     +-+-+-+-+-------+-+-------------+-------------------------------+

Opcodes:

OpcodeHexMeaning
Continuation0x0Fragment continuation
Text0x1UTF-8 text data
Binary0x2Binary data
Close0x8Close the connection
Ping0x9Keepalive ping
Pong0xAPing response

Control Frames: Close, Ping, Pong

Control frames (opcodes 0x8–0xF) are always unFragmented and have payloads of at most 125 bytes.

Close (0x8): Contains a 2-byte status code and optional reason string. Both parties must send a Close frame to initiate graceful shutdown. Common close codes:

  • 1000 Normal Closure
  • 1001 Going Away (server restart, navigation)
  • 1002 Protocol Error
  • 1008 Policy Violation
  • 1011 Internal Server Error

Ping (0x9) / Pong (0xA): Either party may send a Ping at any time. The receiver must respond with a Pong containing the same payload data. Pongs may also be sent unsolicited (e.g., latency measurement). Unanswered Pings within a timeout indicate a dead connection.

Data Framing and Masking

RFC 6455 mandates that all frames sent from client to server must be masked using a 4-byte masking key. Server-to-client frames are never masked.

# Masking algorithm (XOR with cycling key)
def mask(payload: bytes, key: bytes) -> bytes:
    return bytes(b ^ key[i % 4] for i, b in enumerate(payload))

Masking prevents cache poisoning attacks on transparent proxies that might misinterpret WebSocket frames as HTTP responses. It is a protocol security requirement, not encryption.

Large messages may be split into fragments — the first frame has FIN=0, intermediate frames use opcode 0x0 (continuation), and the final fragment has FIN=1. Control frames may be interspersed between fragments.

Closing the Connection

RFC 6455 Section 7 defines a two-step closing handshake:

  • The initiating party sends a Close frame (opcode 0x8)
  • The receiver sends a Close frame in response
  • Both parties close the underlying TCP connection

The initiating side should not send further data after sending Close, but must continue receiving until it gets the responding Close frame. If the peer does not respond within a timeout, the TCP connection may be closed unilaterally.

Security Considerations

  • Use wss:// (WebSocket Secure) — plain ws:// exposes all data to eavesdropping and injection. TLS is mandatory for production.
  • Validate the Origin header during the handshake to prevent cross-site WebSocket hijacking (CSWSH). Reject unexpected origins.
  • Authenticate before upgrade — pass tokens via URL parameters or the initial HTTP headers during the handshake. WebSocket frames do not carry HTTP cookies after upgrade.
  • Rate-limit messages server-side — the absence of HTTP overhead makes WebSocket a potential DoS vector if a client sends thousands of frames per second.

関連プロトコル

関連用語

同シリーズ Protocol Deep Dives