SIP Architecture — The Signaling Plane
The Session Initiation Protocol (RFC 3261, 2002) is a text-based application-layer protocol for establishing, modifying, and terminating multimedia sessions — phone calls, video conferences, and instant messages. It is deliberately modeled after HTTP: requests have methods, responses have 3-digit status codes, and headers look similar.
SIP only handles signaling — negotiating who talks to whom and what codec to use. The actual voice and video bytes travel separately over RTP (Real-time Transport Protocol). This separation gives SIP flexibility but also makes it more complex to understand.
Network Elements
SIP Network Architecture:
┌──────────┐ ┌──────────────┐ ┌──────────┐
│ Alice's │─────────│ SIP Proxy │─────────│ Bob's │
│ UA (UAC) │ │ Server │ │ UA (UAS) │
└──────────┘ └──────────────┘ └──────────┘
│ │ │
│ ┌─────┴──────┐ │
│ │ Registrar │ │
│ │ Server │ │
│ └────────────┘ │
│ │
└──────── RTP Media (voice/video) ────────────┘
(direct path after SIP negotiation)
| Element | Role |
|---|---|
| **User Agent Client (UAC)** | Initiates SIP requests (Alice's phone) |
| **User Agent Server (UAS)** | Receives and responds to requests (Bob's phone) |
| **Proxy Server** | Forwards requests, applies routing and authentication |
| **Registrar** | Accepts REGISTER requests and maps SIP address to IP |
| **Redirect Server** | Returns 3xx responses telling the UAC to contact elsewhere |
The SIP trapezoid is the common topology where two proxy servers handle the signaling while RTP flows directly between the user agents once the session is established — reducing proxy server media load.
SIP URIs
SIP addresses look like email addresses:
sip:[email protected] # SIP URI (UDP/TCP, unencrypted)
sips:[email protected] # SIPS URI (TLS required)
sip:[email protected]:5060 # With explicit IP and port
sip:[email protected] # E.164 phone number as SIP URI
Request Methods
Core Methods (RFC 3261)
Method Purpose
──────────────────────────────────────────────────────────
INVITE Initiate a session (the "phone ringing" method)
ACK Acknowledge receipt of a final response to INVITE
BYE Terminate an established session
CANCEL Cancel a pending INVITE (before final response)
REGISTER Register a contact address with a registrar
OPTIONS Query capabilities of a UA or proxy server
A Minimal SIP INVITE Request
INVITE sip:[email protected] SIP/2.0
Via: SIP/2.0/UDP pc33.example.com;branch=z9hG4bK776asdhds
Max-Forwards: 70
To: Bob <sip:[email protected]>
From: Alice <sip:[email protected]>;tag=1928301774
Call-ID: [email protected]
CSeq: 314159 INVITE
Contact: <sip:[email protected]>
Content-Type: application/sdp
Content-Length: 142
[SDP body describing Alice's audio capabilities]
SIP Response Classes
| Class | Meaning | Examples |
|---|---|---|
| 1xx | Provisional | 100 Trying, 180 Ringing, 183 Session Progress |
| 2xx | Success | 200 OK |
| 3xx | Redirect | 301 Moved Permanently, 302 Moved Temporarily |
| 4xx | Client Error | 400 Bad Request, 401 Unauthorized, 404 Not Found, 486 Busy Here |
| 5xx | Server Error | 500 Server Internal Error, 503 Service Unavailable |
| 6xx | Global Failure | 600 Busy Everywhere, 603 Decline |
The 6xx class is unique to SIP — a 6xx response from any user agent means the request should not be tried elsewhere (unlike 4xx/5xx which may prompt a proxy to try a different route).
Transaction Layer
Transactions vs. Dialogs
SIP has two state machines: transactions (short-lived, single request/response) and dialogs (long-lived, the context of an ongoing call).
A transaction is one SIP request plus all its responses. An INVITE transaction includes the INVITE, all 1xx provisional responses, and the final 2xx/3xx/4xx/5xx/6xx response. The ACK that follows a 2xx is not part of the INVITE transaction — it starts a new transaction.
The Branch Parameter and Transaction Matching
Every SIP request includes a Via header with a branch parameter that uniquely identifies the transaction:
Via: SIP/2.0/UDP client.example.com;branch=z9hG4bKnashds8
The z9hG4bK prefix is a magic cookie defined by RFC 3261 — any branch starting with this prefix uses RFC 3261 transaction matching rules. Responses are matched to transactions by the branch parameter plus the method in CSeq.
Retransmission Timers
SIP can run over UDP (unreliable) or TCP. Over UDP, the transaction layer implements its own retransmission:
Timer A: Initial INVITE retransmit interval = T1 (default 500ms)
Doubles each retransmit: 500ms, 1s, 2s, 4s, 8s...
Timer B: INVITE transaction timeout = 64 * T1 = 32 seconds
If no final response, transaction fails
Timer E: Non-INVITE retransmit (OPTIONS, REGISTER, BYE)
Also starts at T1, caps at T2 (4 seconds)
Timer F: Non-INVITE timeout = 64 * T1 = 32 seconds
Dialog Management
The INVITE Dialog Lifecycle
A dialog is identified by the dialog ID — the triple of Call-ID, From-tag, and To-tag. The From-tag is set by the caller; the To-tag is set by the callee in its 2xx response. Together, they create a unique identifier for the call that survives proxy failures and network changes.
INVITE Dialog Lifecycle:
Alice Proxy Bob
│─── INVITE ────────────────────────────────▶│
│◀── 100 Trying ────────────────────────────│
│◀── 180 Ringing ───────────────────────────│ [phone rings]
│◀── 200 OK ────────────────────────────────│ [call answered]
│─── ACK ───────────────────────────────────▶│ [session confirmed]
│ │
│ ←────── RTP media flows directly ──────── │
│ │
│─── BYE ───────────────────────────────────▶│ [hang up]
│◀── 200 OK ────────────────────────────────│
re-INVITE for Session Modification
Once a dialog is established, either party can send a new INVITE within the dialog (a re-INVITE) to modify the session — put the call on hold, add video, or change the codec:
# Put call on hold (set connection address to 0.0.0.0 in SDP):
INVITE sip:[email protected] SIP/2.0
To: Bob <sip:[email protected]>;tag=a48s ← includes To-tag (within dialog)
From: Alice <sip:[email protected]>;tag=1928301774
CSeq: 314160 INVITE ← incremented CSeq
[SDP with a=sendonly and c=0.0.0.0]
SIP and RTP — The Media Plane
SDP Offer/Answer Model
SIP uses SDP (Session Description Protocol, RFC 4566) in the message body to negotiate media parameters. The INVITE contains an SDP offer; the 200 OK contains an SDP answer.
SDP Offer (in INVITE body):
v=0
o=alice 2890844526 2890844526 IN IP4 pc33.example.com
s=Session
c=IN IP4 pc33.example.com
t=0 0
m=audio 49170 RTP/AVP 0 8 97
a=rtpmap:0 PCMU/8000 ← G.711 µ-law
a=rtpmap:8 PCMA/8000 ← G.711 A-law
a=rtpmap:97 iLBC/8000 ← iLBC codec
SDP Answer (in 200 OK body):
m=audio 3456 RTP/AVP 0 ← Bob chooses PCMU/G.711 on port 3456
a=rtpmap:0 PCMU/8000
After the ACK, RTP audio flows directly between Alice's port 49170 and Bob's port 3456 — bypassing all SIP proxies entirely.
NAT Traversal — STUN, TURN, and ICE
The biggest operational challenge in SIP deployments is NAT. SDP contains the *private* IP address of the UA, which is unreachable from the internet. Three technologies address this:
- STUN (RFC 5389): Lets a UA discover its public IP/port by querying a server. The UA puts the public address in the SDP.
- TURN (RFC 5766): When direct RTP fails, the TURN server relays media. Higher latency but works through symmetric NATs.
- ICE (RFC 5245): Gathers all possible candidate addresses (host, server reflexive from STUN, relayed via TURN) and exchanges them in SDP. The peers then check connectivity for each candidate pair and choose the best working path.
WebRTC, which uses SIP concepts at the signaling layer, made ICE mainstream. Modern SIP deployments (and all WebRTC applications) use ICE for reliable NAT traversal without manual configuration.