What Is Signaling?
WebRTC enables peer-to-peer audio, video, and data channels directly between browsers. But to establish a peer-to-peer connection, both peers need to exchange three types of out-of-band information:
- Session Description Protocol (SDP) — what codecs each peer supports, bitrate limits, and media track descriptions.
- ICE candidates — the IP addresses and ports where each peer can potentially be reached, from direct LAN addresses to STUN-discovered public addresses to TURN relay addresses.
- Control messages — call setup, renegotiation, teardown.
WebRTC deliberately does not define a signaling protocol. You choose how to transport this metadata. The two dominant choices are WebSocket-based custom signaling and SIP over WebSocket (RFC 7118).
The SDP Offer/Answer Model
The offer/answer exchange follows a strict pattern:
Caller Signaling Server Callee
| | |
|-- createOffer() ---------->| |
| setLocalDescription() | |
| |-- forward offer ---->|
| | |-- setRemoteDescription()
| | |-- createAnswer()
| | | setLocalDescription()
| |<-- forward answer ---|
|<-- receive answer ---------| |
| setRemoteDescription() | |
| | |
|-- ICE candidates --------->|-- forward ---------->|
|<-- ICE candidates ---------|<-- forward ----------|
| | |
|<========= P2P connection established ============>|
An SDP offer looks like a structured text block describing the media session:
v=0
o=- 123456789 2 IN IP4 127.0.0.1
s=-
t=0 0
a=group:BUNDLE audio video
m=audio 9 UDP/TLS/RTP/SAVPF 111
c=IN IP4 0.0.0.0
a=rtpmap:111 opus/48000/2
a=fmtp:111 minptime=10;useinbandfec=1
m=video 9 UDP/TLS/RTP/SAVPF 96
a=rtpmap:96 VP8/90000
WebSocket as Signaling Transport
The simplest and most common approach: a WebSocket server acts as a message relay. Peers connect, join a room, and forward SDP and ICE messages through the server. The server is stateless regarding media — it only routes JSON messages.
// Client-side WebRTC + WebSocket signaling
const ws = new WebSocket('wss://signal.example.com/call');
const pc = new RTCPeerConnection({
iceServers: [
{ urls: 'stun:stun.example.com:3478' },
{ urls: 'turn:turn.example.com:3478', username: 'user', credential: 'pass' }
]
});
// Send ICE candidates as they are discovered
pc.onicecandidate = ({ candidate }) => {
if (candidate) {
ws.send(JSON.stringify({ type: 'ice-candidate', candidate }));
}
};
// Initiate a call
async function startCall(roomId) {
const stream = await navigator.mediaDevices.getUserMedia({ video: true, audio: true });
stream.getTracks().forEach(track => pc.addTrack(track, stream));
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
ws.send(JSON.stringify({ type: 'offer', room: roomId, sdp: offer }));
}
// Handle incoming messages
ws.onmessage = async ({ data }) => {
const msg = JSON.parse(data);
if (msg.type === 'offer') {
await pc.setRemoteDescription(new RTCSessionDescription(msg.sdp));
const answer = await pc.createAnswer();
await pc.setLocalDescription(answer);
ws.send(JSON.stringify({ type: 'answer', sdp: answer }));
} else if (msg.type === 'answer') {
await pc.setRemoteDescription(new RTCSessionDescription(msg.sdp));
} else if (msg.type === 'ice-candidate') {
await pc.addIceCandidate(new RTCIceCandidate(msg.candidate));
}
};
Room-Based Architecture and Presence
A production signaling server manages rooms, presence, and authentication:
# Python asyncio WebSocket signaling server (simplified)
import asyncio
import json
import websockets
rooms: dict[str, set] = {} # room_id -> set of websocket connections
async def handler(websocket):
room_id = None
try:
async for raw in websocket:
msg = json.loads(raw)
if msg['type'] == 'join':
room_id = msg['room']
rooms.setdefault(room_id, set()).add(websocket)
# Notify existing peers
await broadcast(room_id, {'type': 'peer-joined'}, exclude=websocket)
elif msg['type'] in ('offer', 'answer', 'ice-candidate'):
await broadcast(room_id, msg, exclude=websocket)
finally:
if room_id and room_id in rooms:
rooms[room_id].discard(websocket)
await broadcast(room_id, {'type': 'peer-left'}, exclude=websocket)
async def broadcast(room_id, msg, exclude=None):
peers = rooms.get(room_id, set()) - ({exclude} if exclude else set())
if peers:
await asyncio.gather(*[p.send(json.dumps(msg)) for p in peers])
SIP Integration — WebRTC to PSTN
When WebRTC calls need to reach regular phone networks (PSTN), the browser uses SIP over WebSocket (RFC 7118). A SIP proxy bridges WebRTC SDP to traditional SIP INVITE messages and onwards to a VoIP carrier.
SIP.js is the most popular browser-side library:
import { UserAgent, Registerer, Inviter, SessionState } from 'sip.js';
const ua = new UserAgent({
uri: UserAgent.makeURI('sip:[email protected]'),
transportOptions: {
server: 'wss://sip.example.com:8443/ws', // SIP over WebSocket
},
authorizationUsername: 'alice',
authorizationPassword: 'secret',
});
await ua.start();
const registerer = new Registerer(ua);
await registerer.register(); // SIP REGISTER → 200 OK
// Place a call to a phone number
const target = UserAgent.makeURI('sip:[email protected]');
const inviter = new Inviter(ua, target);
inviter.stateChange.addListener(state => {
if (state === SessionState.Established) console.log('Call connected');
if (state === SessionState.Terminated) console.log('Call ended');
});
await inviter.invite(); // SIP INVITE → 100 Trying → 180 Ringing → 200 OK
STUN and TURN: NAT Traversal
Most devices are behind NAT routers and do not have a public IP address. ICE (Interactive Connectivity Establishment) systematically discovers all possible paths between peers.
STUN (Session Traversal Utilities for NAT) lets a client discover its public IP:port as seen by the internet:
Client → STUN Server: Binding Request
STUN Server → Client: Binding Response {XOR-MAPPED-ADDRESS: 203.0.113.1:54321}
The client adds 203.0.113.1:54321 as a server-reflexive candidate. If both peers can discover their public addresses and their NAT types allow direct connection (full-cone or restricted-cone NAT), the P2P path succeeds.
TURN (Traversal Using Relays around NAT) is the fallback when direct P2P fails (symmetric NAT, strict firewalls). The TURN server relays all media:
Client A ←→ TURN Server ←→ Client B (all media relayed)
Setting up coturn (the open-source TURN server):
# /etc/turnserver.conf
listening-port=3478
tls-listening-port=5349
realm=example.com
server-name=turn.example.com
lt-cred-mech
use-auth-secret
static-auth-secret=your-shared-secret
cert=/etc/letsencrypt/live/turn.example.com/fullchain.pem
pkey=/etc/letsencrypt/live/turn.example.com/privkey.pem
Generate time-limited TURN credentials in your signaling server:
import hmac, hashlib, base64, time
def generate_turn_credentials(username: str, ttl: int = 3600) -> dict:
expiry = int(time.time()) + ttl
turn_username = f'{expiry}:{username}'
credential = base64.b64encode(
hmac.new(TURN_SECRET.encode(), turn_username.encode(), hashlib.sha1).digest()
).decode()
return {'username': turn_username, 'credential': credential}
Production Considerations
Signaling server clustering — signaling servers must share room state when horizontally scaled. Use Redis pub/sub or a sticky load balancer. The signaling server itself is very lightweight; a single t3.small handles tens of thousands of concurrent calls.
Certificate management — WebRTC mandates DTLS encryption for all media. Browsers enforce this even on local networks. Your TURN server needs a valid TLS certificate for the WebSocket signaling path.
Firewall traversal — corporate firewalls often block UDP entirely. Use TURN over TCP (port 443) as the final fallback. About 8–10% of WebRTC calls require a TURN relay; budget your TURN server bandwidth accordingly (each relayed call uses 2× the bandwidth since both sides stream through the relay).