Reducing Time to First Byte (TTFB)

What Is TTFB?

Time to First Byte (TTFB) measures the elapsed time from a client sending a request to receiving the first byte of the response body. It captures:

Network latency — round-trip time between client and server
Server processing time — application code, database queries, external API calls
Queue time — waiting behind other requests in the server's thread pool

TTFB is a leading indicator of overall page load speed. Google's Core Web Vitals guidelines recommend TTFB below 800 ms for a "Good" rating. For API endpoints, aim for under 200 ms at the 95th percentile.

# Measure TTFB with curl
curl -o /dev/null -s -w 'TTFB: %{time_starttransfer}s\n' https://api.example.com/

Server-Side Optimization

Eliminate Synchronous Blocking

The most common TTFB killers are synchronous operations in the request path: DNS lookups, external HTTP calls, sequential database queries.

# Bad: sequential external calls add up
user = fetch_user(user_id)       # 50ms
prefs = fetch_prefs(user_id)     # 50ms
perms = fetch_permissions(user_id) # 50ms
# Total: 150ms blocking

# Better: parallelize with asyncio or threads
user, prefs, perms = await asyncio.gather(
    fetch_user(user_id),
    fetch_prefs(user_id),
    fetch_permissions(user_id),
)  # Total: ~50ms

Tune the Thread / Worker Pool

If requests queue behind each other, TTFB spikes under load. For CPU-bound workloads, set workers = CPU cores. For I/O-bound (most web APIs), use async workers or more threads:

# Gunicorn: async workers for I/O-bound Django
gunicorn --workers 4 --worker-class uvicorn.workers.UvicornWorker myapp.asgi:application

Database Query Optimization

Database queries are responsible for the majority of TTFB in data-driven APIs. Instrument every endpoint to capture slow queries:

Technique	Impact
Add indexes on filter/sort columns	100–1000× speedup
Use `SELECT` only needed columns	Reduces I/O and deserialization
Avoid N+1 queries (`select_related`)	Eliminates cascading round trips
Use `EXPLAIN ANALYZE`	Reveals seq scans, hash joins
Connection pool (pgBouncer)	Cuts connection overhead

In Django, use the Django Debug Toolbar or django-silk in development to surface N+1 queries and long-running SQL.

CDN and Edge Computing

Network RTT is a floor on TTFB that no server optimization can overcome. A user in Tokyo hitting a server in Virginia experiences 150–200 ms of network latency alone.

CDN caching is the highest-leverage solution: serve responses from an edge node 5–20 ms away:

Cache-Control: public, s-maxage=300, stale-while-revalidate=600

Edge computing (Cloudflare Workers, Fastly Compute) moves application logic to the CDN edge. Dynamic personalization, A/B routing, and authentication checks can run at the edge without touching the origin.

103 Early Hints

103 Early Hints (RFC 8297) lets the server send Link headers *before* the final response is ready. The browser can start fetching critical subresources (fonts, CSS, JS) while the server is still generating the HTML response body:

HTTP/1.1 103 Early Hints
Link: </styles.css>; rel=preload; as=style
Link: </fonts/geist.woff2>; rel=preload; as=font; crossorigin

(server continues generating response...)

HTTP/1.1 200 OK
Content-Type: text/html
...

103 is particularly effective for server-rendered pages where the HTML generation takes 100–500 ms. Supported by Chrome, Firefox, and major CDNs. Nginx and Caddy support it natively; configure at the edge layer for maximum compatibility.

Measuring and Monitoring

Track TTFB with multiple tools to get a complete picture:

# curl: measure TTFB from different regions via VPN or proxy
curl -o /dev/null -s -w 'DNS: %{time_namelookup}s | Connect: %{time_connect}s | TLS: %{time_appconnect}s | TTFB: %{time_starttransfer}s\n' https://api.example.com/

In production, instrument with real user monitoring (RUM) — Cloudflare Web Analytics, Google CrUX, or Datadog RUM. Aggregate TTFB by country, device type, and endpoint to identify the highest-impact optimizations. Set alerts when p95 TTFB exceeds your SLO threshold.