Real-Time Protocols

WebRTC Signaling with SIP and WebSocket

How WebRTC peers exchange session descriptions using signaling servers — SIP integration, WebSocket-based signaling, STUN/TURN servers, and ICE candidate negotiation.

What Is Signaling?

WebRTC enables peer-to-peer audio, video, and data channels directly between browsers. But to establish a peer-to-peer connection, both peers need to exchange three types of out-of-band information:

  • Session Description Protocol (SDP) — what codecs each peer supports, bitrate limits, and media track descriptions.
  • ICE candidates — the IP addresses and ports where each peer can potentially be reached, from direct LAN addresses to STUN-discovered public addresses to TURN relay addresses.
  • Control messages — call setup, renegotiation, teardown.

WebRTC deliberately does not define a signaling protocol. You choose how to transport this metadata. The two dominant choices are WebSocket-based custom signaling and SIP over WebSocket (RFC 7118).

The SDP Offer/Answer Model

The offer/answer exchange follows a strict pattern:

Caller                    Signaling Server          Callee
  |                            |                      |
  |-- createOffer() ---------->|                      |
  |   setLocalDescription()    |                      |
  |                            |-- forward offer ---->|
  |                            |                      |-- setRemoteDescription()
  |                            |                      |-- createAnswer()
  |                            |                      |   setLocalDescription()
  |                            |<-- forward answer ---|
  |<-- receive answer ---------|                      |
  |   setRemoteDescription()   |                      |
  |                            |                      |
  |-- ICE candidates --------->|-- forward ---------->|
  |<-- ICE candidates ---------|<-- forward ----------|
  |                            |                      |
  |<========= P2P connection established ============>|

An SDP offer looks like a structured text block describing the media session:

v=0
o=- 123456789 2 IN IP4 127.0.0.1
s=-
t=0 0
a=group:BUNDLE audio video
m=audio 9 UDP/TLS/RTP/SAVPF 111
c=IN IP4 0.0.0.0
a=rtpmap:111 opus/48000/2
a=fmtp:111 minptime=10;useinbandfec=1
m=video 9 UDP/TLS/RTP/SAVPF 96
a=rtpmap:96 VP8/90000

WebSocket as Signaling Transport

The simplest and most common approach: a WebSocket server acts as a message relay. Peers connect, join a room, and forward SDP and ICE messages through the server. The server is stateless regarding media — it only routes JSON messages.

// Client-side WebRTC + WebSocket signaling
const ws = new WebSocket('wss://signal.example.com/call');
const pc = new RTCPeerConnection({
  iceServers: [
    { urls: 'stun:stun.example.com:3478' },
    { urls: 'turn:turn.example.com:3478', username: 'user', credential: 'pass' }
  ]
});

// Send ICE candidates as they are discovered
pc.onicecandidate = ({ candidate }) => {
  if (candidate) {
    ws.send(JSON.stringify({ type: 'ice-candidate', candidate }));
  }
};

// Initiate a call
async function startCall(roomId) {
  const stream = await navigator.mediaDevices.getUserMedia({ video: true, audio: true });
  stream.getTracks().forEach(track => pc.addTrack(track, stream));

  const offer = await pc.createOffer();
  await pc.setLocalDescription(offer);

  ws.send(JSON.stringify({ type: 'offer', room: roomId, sdp: offer }));
}

// Handle incoming messages
ws.onmessage = async ({ data }) => {
  const msg = JSON.parse(data);
  if (msg.type === 'offer') {
    await pc.setRemoteDescription(new RTCSessionDescription(msg.sdp));
    const answer = await pc.createAnswer();
    await pc.setLocalDescription(answer);
    ws.send(JSON.stringify({ type: 'answer', sdp: answer }));
  } else if (msg.type === 'answer') {
    await pc.setRemoteDescription(new RTCSessionDescription(msg.sdp));
  } else if (msg.type === 'ice-candidate') {
    await pc.addIceCandidate(new RTCIceCandidate(msg.candidate));
  }
};

Room-Based Architecture and Presence

A production signaling server manages rooms, presence, and authentication:

# Python asyncio WebSocket signaling server (simplified)
import asyncio
import json
import websockets

rooms: dict[str, set] = {}  # room_id -> set of websocket connections

async def handler(websocket):
    room_id = None
    try:
        async for raw in websocket:
            msg = json.loads(raw)
            if msg['type'] == 'join':
                room_id = msg['room']
                rooms.setdefault(room_id, set()).add(websocket)
                # Notify existing peers
                await broadcast(room_id, {'type': 'peer-joined'}, exclude=websocket)
            elif msg['type'] in ('offer', 'answer', 'ice-candidate'):
                await broadcast(room_id, msg, exclude=websocket)
    finally:
        if room_id and room_id in rooms:
            rooms[room_id].discard(websocket)
            await broadcast(room_id, {'type': 'peer-left'}, exclude=websocket)

async def broadcast(room_id, msg, exclude=None):
    peers = rooms.get(room_id, set()) - ({exclude} if exclude else set())
    if peers:
        await asyncio.gather(*[p.send(json.dumps(msg)) for p in peers])

SIP Integration — WebRTC to PSTN

When WebRTC calls need to reach regular phone networks (PSTN), the browser uses SIP over WebSocket (RFC 7118). A SIP proxy bridges WebRTC SDP to traditional SIP INVITE messages and onwards to a VoIP carrier.

SIP.js is the most popular browser-side library:

import { UserAgent, Registerer, Inviter, SessionState } from 'sip.js';

const ua = new UserAgent({
  uri: UserAgent.makeURI('sip:[email protected]'),
  transportOptions: {
    server: 'wss://sip.example.com:8443/ws',  // SIP over WebSocket
  },
  authorizationUsername: 'alice',
  authorizationPassword: 'secret',
});

await ua.start();
const registerer = new Registerer(ua);
await registerer.register();  // SIP REGISTER → 200 OK

// Place a call to a phone number
const target = UserAgent.makeURI('sip:[email protected]');
const inviter = new Inviter(ua, target);
inviter.stateChange.addListener(state => {
  if (state === SessionState.Established) console.log('Call connected');
  if (state === SessionState.Terminated) console.log('Call ended');
});
await inviter.invite();  // SIP INVITE → 100 Trying → 180 Ringing → 200 OK

STUN and TURN: NAT Traversal

Most devices are behind NAT routers and do not have a public IP address. ICE (Interactive Connectivity Establishment) systematically discovers all possible paths between peers.

STUN (Session Traversal Utilities for NAT) lets a client discover its public IP:port as seen by the internet:

Client → STUN Server: Binding Request
STUN Server → Client: Binding Response {XOR-MAPPED-ADDRESS: 203.0.113.1:54321}

The client adds 203.0.113.1:54321 as a server-reflexive candidate. If both peers can discover their public addresses and their NAT types allow direct connection (full-cone or restricted-cone NAT), the P2P path succeeds.

TURN (Traversal Using Relays around NAT) is the fallback when direct P2P fails (symmetric NAT, strict firewalls). The TURN server relays all media:

Client A ←→ TURN Server ←→ Client B  (all media relayed)

Setting up coturn (the open-source TURN server):

# /etc/turnserver.conf
listening-port=3478
tls-listening-port=5349
realm=example.com
server-name=turn.example.com
lt-cred-mech
use-auth-secret
static-auth-secret=your-shared-secret
cert=/etc/letsencrypt/live/turn.example.com/fullchain.pem
pkey=/etc/letsencrypt/live/turn.example.com/privkey.pem

Generate time-limited TURN credentials in your signaling server:

import hmac, hashlib, base64, time

def generate_turn_credentials(username: str, ttl: int = 3600) -> dict:
    expiry = int(time.time()) + ttl
    turn_username = f'{expiry}:{username}'
    credential = base64.b64encode(
        hmac.new(TURN_SECRET.encode(), turn_username.encode(), hashlib.sha1).digest()
    ).decode()
    return {'username': turn_username, 'credential': credential}

Production Considerations

Signaling server clustering — signaling servers must share room state when horizontally scaled. Use Redis pub/sub or a sticky load balancer. The signaling server itself is very lightweight; a single t3.small handles tens of thousands of concurrent calls.

Certificate management — WebRTC mandates DTLS encryption for all media. Browsers enforce this even on local networks. Your TURN server needs a valid TLS certificate for the WebSocket signaling path.

Firewall traversal — corporate firewalls often block UDP entirely. Use TURN over TCP (port 443) as the final fallback. About 8–10% of WebRTC calls require a TURN relay; budget your TURN server bandwidth accordingly (each relayed call uses 2× the bandwidth since both sides stream through the relay).

Related Protocols

Related Glossary Terms

More in Real-Time Protocols