Protocol Deep Dives

RFC 3261: SIP Protocol Deep Dive

How the Session Initiation Protocol enables VoIP — request/response model, transaction layer, dialog management, and integration with RTP for media.

SIP Architecture — The Signaling Plane

The Session Initiation Protocol (RFC 3261, 2002) is a text-based application-layer protocol for establishing, modifying, and terminating multimedia sessions — phone calls, video conferences, and instant messages. It is deliberately modeled after HTTP: requests have methods, responses have 3-digit status codes, and headers look similar.

SIP only handles signaling — negotiating who talks to whom and what codec to use. The actual voice and video bytes travel separately over RTP (Real-time Transport Protocol). This separation gives SIP flexibility but also makes it more complex to understand.

Network Elements

SIP Network Architecture:

  ┌──────────┐         ┌──────────────┐         ┌──────────┐
  │ Alice's  │─────────│  SIP Proxy   │─────────│  Bob's   │
  │ UA (UAC) │         │   Server     │         │ UA (UAS) │
  └──────────┘         └──────────────┘         └──────────┘
       │                      │                       │
       │                ┌─────┴──────┐               │
       │                │  Registrar │               │
       │                │   Server   │               │
       │                └────────────┘               │
       │                                              │
       └──────── RTP Media (voice/video) ────────────┘
         (direct path after SIP negotiation)
ElementRole
**User Agent Client (UAC)**Initiates SIP requests (Alice's phone)
**User Agent Server (UAS)**Receives and responds to requests (Bob's phone)
**Proxy Server**Forwards requests, applies routing and authentication
**Registrar**Accepts REGISTER requests and maps SIP address to IP
**Redirect Server**Returns 3xx responses telling the UAC to contact elsewhere

The SIP trapezoid is the common topology where two proxy servers handle the signaling while RTP flows directly between the user agents once the session is established — reducing proxy server media load.

SIP URIs

SIP addresses look like email addresses:

sip:[email protected]          # SIP URI (UDP/TCP, unencrypted)
sips:[email protected]         # SIPS URI (TLS required)
sip:[email protected]:5060      # With explicit IP and port
sip:[email protected]   # E.164 phone number as SIP URI

Request Methods

Core Methods (RFC 3261)

Method      Purpose
──────────────────────────────────────────────────────────
INVITE      Initiate a session (the "phone ringing" method)
ACK         Acknowledge receipt of a final response to INVITE
BYE         Terminate an established session
CANCEL      Cancel a pending INVITE (before final response)
REGISTER    Register a contact address with a registrar
OPTIONS     Query capabilities of a UA or proxy server

A Minimal SIP INVITE Request

INVITE sip:[email protected] SIP/2.0
Via: SIP/2.0/UDP pc33.example.com;branch=z9hG4bK776asdhds
Max-Forwards: 70
To: Bob <sip:[email protected]>
From: Alice <sip:[email protected]>;tag=1928301774
Call-ID: [email protected]
CSeq: 314159 INVITE
Contact: <sip:[email protected]>
Content-Type: application/sdp
Content-Length: 142

[SDP body describing Alice's audio capabilities]

SIP Response Classes

ClassMeaningExamples
1xxProvisional100 Trying, 180 Ringing, 183 Session Progress
2xxSuccess200 OK
3xxRedirect301 Moved Permanently, 302 Moved Temporarily
4xxClient Error400 Bad Request, 401 Unauthorized, 404 Not Found, 486 Busy Here
5xxServer Error500 Server Internal Error, 503 Service Unavailable
6xxGlobal Failure600 Busy Everywhere, 603 Decline

The 6xx class is unique to SIP — a 6xx response from any user agent means the request should not be tried elsewhere (unlike 4xx/5xx which may prompt a proxy to try a different route).

Transaction Layer

Transactions vs. Dialogs

SIP has two state machines: transactions (short-lived, single request/response) and dialogs (long-lived, the context of an ongoing call).

A transaction is one SIP request plus all its responses. An INVITE transaction includes the INVITE, all 1xx provisional responses, and the final 2xx/3xx/4xx/5xx/6xx response. The ACK that follows a 2xx is not part of the INVITE transaction — it starts a new transaction.

The Branch Parameter and Transaction Matching

Every SIP request includes a Via header with a branch parameter that uniquely identifies the transaction:

Via: SIP/2.0/UDP client.example.com;branch=z9hG4bKnashds8

The z9hG4bK prefix is a magic cookie defined by RFC 3261 — any branch starting with this prefix uses RFC 3261 transaction matching rules. Responses are matched to transactions by the branch parameter plus the method in CSeq.

Retransmission Timers

SIP can run over UDP (unreliable) or TCP. Over UDP, the transaction layer implements its own retransmission:

Timer A: Initial INVITE retransmit interval = T1 (default 500ms)
         Doubles each retransmit: 500ms, 1s, 2s, 4s, 8s...
Timer B: INVITE transaction timeout = 64 * T1 = 32 seconds
         If no final response, transaction fails

Timer E: Non-INVITE retransmit (OPTIONS, REGISTER, BYE)
         Also starts at T1, caps at T2 (4 seconds)
Timer F: Non-INVITE timeout = 64 * T1 = 32 seconds

Dialog Management

The INVITE Dialog Lifecycle

A dialog is identified by the dialog ID — the triple of Call-ID, From-tag, and To-tag. The From-tag is set by the caller; the To-tag is set by the callee in its 2xx response. Together, they create a unique identifier for the call that survives proxy failures and network changes.

INVITE Dialog Lifecycle:

Alice                  Proxy                  Bob
  │─── INVITE ────────────────────────────────▶│
  │◀── 100 Trying ────────────────────────────│
  │◀── 180 Ringing ───────────────────────────│   [phone rings]
  │◀── 200 OK ────────────────────────────────│   [call answered]
  │─── ACK ───────────────────────────────────▶│   [session confirmed]
  │                                            │
  │ ←────── RTP media flows directly ──────── │
  │                                            │
  │─── BYE ───────────────────────────────────▶│   [hang up]
  │◀── 200 OK ────────────────────────────────│

re-INVITE for Session Modification

Once a dialog is established, either party can send a new INVITE within the dialog (a re-INVITE) to modify the session — put the call on hold, add video, or change the codec:

# Put call on hold (set connection address to 0.0.0.0 in SDP):
INVITE sip:[email protected] SIP/2.0
To: Bob <sip:[email protected]>;tag=a48s   ← includes To-tag (within dialog)
From: Alice <sip:[email protected]>;tag=1928301774
CSeq: 314160 INVITE   ← incremented CSeq

[SDP with a=sendonly and c=0.0.0.0]

SIP and RTP — The Media Plane

SDP Offer/Answer Model

SIP uses SDP (Session Description Protocol, RFC 4566) in the message body to negotiate media parameters. The INVITE contains an SDP offer; the 200 OK contains an SDP answer.

SDP Offer (in INVITE body):
v=0
o=alice 2890844526 2890844526 IN IP4 pc33.example.com
s=Session
c=IN IP4 pc33.example.com
t=0 0
m=audio 49170 RTP/AVP 0 8 97
a=rtpmap:0 PCMU/8000      ← G.711 µ-law
a=rtpmap:8 PCMA/8000      ← G.711 A-law
a=rtpmap:97 iLBC/8000     ← iLBC codec

SDP Answer (in 200 OK body):
m=audio 3456 RTP/AVP 0    ← Bob chooses PCMU/G.711 on port 3456
a=rtpmap:0 PCMU/8000

After the ACK, RTP audio flows directly between Alice's port 49170 and Bob's port 3456 — bypassing all SIP proxies entirely.

NAT Traversal — STUN, TURN, and ICE

The biggest operational challenge in SIP deployments is NAT. SDP contains the *private* IP address of the UA, which is unreachable from the internet. Three technologies address this:

  • STUN (RFC 5389): Lets a UA discover its public IP/port by querying a server. The UA puts the public address in the SDP.
  • TURN (RFC 5766): When direct RTP fails, the TURN server relays media. Higher latency but works through symmetric NATs.
  • ICE (RFC 5245): Gathers all possible candidate addresses (host, server reflexive from STUN, relayed via TURN) and exchanges them in SDP. The peers then check connectivity for each candidate pair and choose the best working path.

WebRTC, which uses SIP concepts at the signaling layer, made ICE mainstream. Modern SIP deployments (and all WebRTC applications) use ICE for reliable NAT traversal without manual configuration.

Related Protocols

Related Glossary Terms

More in Protocol Deep Dives