What Makes TCP Reliable?
TCP (Transmission Control Protocol) is one of the two foundational transport protocols of the internet, alongside UDP. Unlike UDP, TCP provides:
- Guaranteed delivery — lost packets are retransmitted
- In-order delivery — bytes arrive in the exact sequence they were sent
- Flow control — receiver signals how much data it can accept
- Congestion control — sender backs off when the network is overloaded
All of these properties are implemented through the connection lifecycle — a series of state transitions governed by the TCP state machine.
The Three-Way Handshake
Before any data can be exchanged, TCP requires both sides to agree on connection parameters through a three-step handshake:
Client Server
| |
|---- SYN (seq=1000) ---------->| [Client picks random ISN: 1000]
| | Server enters SYN_RECEIVED state
|<--- SYN-ACK (seq=5000, | [Server picks random ISN: 5000]
| ack=1001) ---------- | ack = client_ISN + 1
| |
|---- ACK (ack=5001) ---------->| ack = server_ISN + 1
| | Both sides enter ESTABLISHED state
|====== Data Transfer ==========|
Initial Sequence Numbers (ISN)
Each side independently chooses a random Initial Sequence Number (ISN). Randomization is critical for security — a predictable ISN would allow attackers to inject data into existing connections (TCP session hijacking, RFC 6528).
The ISN is not zero because multiple connections can exist between the same pair of IP addresses and ports over time. Random ISNs prevent stale packets from a previous connection from being interpreted as belonging to a new one.
SYN Flood Attacks
The three-way handshake creates a vulnerability: after receiving a SYN, the server allocates memory for the half-open connection before the client confirms. A SYN flood attack sends thousands of SYN packets with spoofed source IPs — the server's SYN backlog fills up and legitimate connections are refused.
Modern kernels mitigate this with SYN cookies: instead of allocating state on SYN, the server encodes connection parameters in the sequence number of the SYN-ACK. State is only allocated when the final ACK arrives with the correct cookie:
# Enable SYN cookies (Linux):
sudo sysctl -w net.ipv4.tcp_syncookies=1
# Persistent (add to /etc/sysctl.conf):
net.ipv4.tcp_syncookies = 1
Connection Parameters
During the handshake, both sides negotiate connection parameters via TCP options in the SYN and SYN-ACK segments:
Maximum Segment Size (MSS)
MSS tells the remote side the largest TCP segment it will accept:
# Typical MSS for Ethernet:
MSS = MTU (1500) - IP header (20) - TCP header (20) = 1460 bytes
# Check MSS negotiated for a connection:
ss -tin dst example.com:443 | grep mss
Window Scaling (RFC 7323)
The original TCP window field is 16 bits, limiting the window size to 65,535 bytes. For high-bandwidth, high-latency links (satellite, transcontinental), this is too small. Window scaling extends the window to up to 1 GB:
# Window scale factor of 7 means: actual window = advertised_window * 2^7 = * 128
# So an advertised window of 65535 * 128 = 8,388,480 bytes (~8 MB)
# Verify window scaling is negotiated:
tcpdump -n -r capture.pcap tcp[13] & 2 != 0
SACK (Selective Acknowledgment)
Without SACK, if packet 5 of 10 is lost, the receiver can only acknowledge up to packet 4 — the sender must retransmit from packet 5 onward. SACK allows the receiver to acknowledge out-of-order segments, so only the lost packet is retransmitted:
# SACK is enabled by default on Linux:
sysctl net.ipv4.tcp_sack
# net.ipv4.tcp_sack = 1
Data Transfer: Flow and Congestion Control
Sliding Window (Flow Control)
Flow control prevents the sender from overwhelming the receiver's buffer. The receiver advertises its available buffer space (window size) in every ACK. The sender cannot have more unacknowledged bytes in flight than the advertised window:
Sender Receiver
|-- seg 1 (1000 bytes) ---------->| Window: 4000 bytes
|-- seg 2 (1000 bytes) ---------->| Window remaining: 3000
|-- seg 3 (1000 bytes) ---------->| Window remaining: 2000
|-- seg 4 (1000 bytes) ---------->| Window remaining: 1000
|--- (must wait for ACK) -------->|
|<-- ACK (window=4000) -----------| Buffer processed
|-- seg 5 (1000 bytes) ---------->| Window remaining: 3000
Congestion Control
While flow control is receiver-driven, congestion control is sender-driven. TCP infers network congestion from packet loss and round-trip time:
Slow Start: New connections begin conservatively. The sender starts with a small congestion window (cwnd) and doubles it every RTT until a loss occurs.
Congestion Avoidance: After reaching the slow start threshold, cwnd grows linearly by one MSS per RTT.
Fast Retransmit: Three duplicate ACKs indicate a lost packet. The sender retransmits immediately without waiting for a timeout.
# Check current congestion control algorithm:
sysctl net.ipv4.tcp_congestion_control
# net.ipv4.tcp_congestion_control = bbr
# BBR (Bottleneck Bandwidth and RTT) — Google's modern algorithm,
# significantly better than CUBIC on lossy links:
sudo sysctl -w net.ipv4.tcp_congestion_control=bbr
Connection Termination
Four-Way FIN Close
TCP supports half-close — each side can independently close its send direction:
Client Server
|---- FIN (seq=2000) ---------->| Client done sending
|<--- ACK (ack=2001) ---------- | Server acknowledges
|<--- FIN (seq=8000) ---------- | Server done sending
|---- ACK (ack=8001) ---------->|
| [Client enters TIME_WAIT] |
RST (Reset)
A RST segment abruptly terminates a connection without the four-way close:
# RST is sent when:
# - A packet arrives for a closed port (connection refused)
# - Application crashes without closing the socket
# - Firewall injects RST to terminate connections
# Observe RST in tcpdump:
tcpdump -n 'tcp[tcpflags] & tcp-rst != 0'
TIME_WAIT State
After the client sends the final ACK, it enters TIME_WAIT for 2 × MSL (Maximum Segment Lifetime, typically 60 seconds = 2 minutes total). This ensures:
- The final ACK arrives at the server (if lost, server re-sends FIN and client can re-ACK)
- Old duplicate packets from this connection expire before the port is reused
# View TIME_WAIT connections:
ss -ant state time-wait | head -20
# TIME_WAIT exhaustion (running out of ephemeral ports):
# Symptom: 'connect: Cannot assign requested address'
# Fix options:
# 1. Increase ephemeral port range:
sudo sysctl -w net.ipv4.ip_local_port_range="1024 65535"
# 2. Enable TCP Fast Open to reduce new connection overhead:
sudo sysctl -w net.ipv4.tcp_fastopen=3
# 3. Enable TIME_WAIT reuse (use cautiously — violates strict RFC):
sudo sysctl -w net.ipv4.tcp_tw_reuse=1
Troubleshooting with tcpdump
# Capture a full TCP conversation:
tcpdump -n -i eth0 -w capture.pcap host 203.0.113.10 and port 443
# Analyze in Wireshark (follow TCP stream):
# Right-click on any packet → Follow → TCP Stream
# Quick checks:
# Connection refused (RST immediately after SYN):
tcpdump -n 'tcp[tcpflags] & (tcp-syn|tcp-rst) == (tcp-syn|tcp-rst)'
# Retransmissions (sign of packet loss):
tcpdump -n -v 'tcp[tcpflags] & tcp-syn != 0'
# See TCP state for established connections:
ss -tn dst 203.0.113.10
# Latency per connection:
ss -tin dst 203.0.113.10 | grep rtt
TCP Keep-Alive
Without keep-alive, an idle TCP connection is silently terminated by NAT gateways and firewalls after their idle timeout (often 30–300 seconds):
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(('example.com', 443))
# Enable TCP keep-alive:
sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
# Start keep-alive probes after 60 seconds of idle:
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 60)
# Send probes every 10 seconds:
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 10)
# Drop connection after 5 failed probes:
sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 5)
Summary
The TCP connection lifecycle — three-way handshake, data transfer with sliding window and congestion control, and four-way FIN close — is the foundation of reliable internet communication. Understanding TIME_WAIT, SYN floods, and MSS negotiation helps diagnose the most common production networking issues: connection refused, port exhaustion, and unexpected disconnections under load.