What Is TTFB?
Time to First Byte (TTFB) measures the elapsed time from a client sending a request to receiving the first byte of the response body. It captures:
- Network latency — round-trip time between client and server
- Server processing time — application code, database queries, external API calls
- Queue time — waiting behind other requests in the server's thread pool
TTFB is a leading indicator of overall page load speed. Google's Core Web Vitals guidelines recommend TTFB below 800 ms for a "Good" rating. For API endpoints, aim for under 200 ms at the 95th percentile.
# Measure TTFB with curl
curl -o /dev/null -s -w 'TTFB: %{time_starttransfer}s\n' https://api.example.com/
Server-Side Optimization
Eliminate Synchronous Blocking
The most common TTFB killers are synchronous operations in the request path: DNS lookups, external HTTP calls, sequential database queries.
# Bad: sequential external calls add up
user = fetch_user(user_id) # 50ms
prefs = fetch_prefs(user_id) # 50ms
perms = fetch_permissions(user_id) # 50ms
# Total: 150ms blocking
# Better: parallelize with asyncio or threads
user, prefs, perms = await asyncio.gather(
fetch_user(user_id),
fetch_prefs(user_id),
fetch_permissions(user_id),
) # Total: ~50ms
Tune the Thread / Worker Pool
If requests queue behind each other, TTFB spikes under load. For CPU-bound workloads, set workers = CPU cores. For I/O-bound (most web APIs), use async workers or more threads:
# Gunicorn: async workers for I/O-bound Django
gunicorn --workers 4 --worker-class uvicorn.workers.UvicornWorker myapp.asgi:application
Database Query Optimization
Database queries are responsible for the majority of TTFB in data-driven APIs. Instrument every endpoint to capture slow queries:
| Technique | Impact |
|---|---|
| Add indexes on filter/sort columns | 100–1000× speedup |
| Use `SELECT` only needed columns | Reduces I/O and deserialization |
| Avoid N+1 queries (`select_related`) | Eliminates cascading round trips |
| Use `EXPLAIN ANALYZE` | Reveals seq scans, hash joins |
| Connection pool (pgBouncer) | Cuts connection overhead |
In Django, use the Django Debug Toolbar or django-silk in development to surface N+1 queries and long-running SQL.
CDN and Edge Computing
Network RTT is a floor on TTFB that no server optimization can overcome. A user in Tokyo hitting a server in Virginia experiences 150–200 ms of network latency alone.
CDN caching is the highest-leverage solution: serve responses from an edge node 5–20 ms away:
Cache-Control: public, s-maxage=300, stale-while-revalidate=600
Edge computing (Cloudflare Workers, Fastly Compute) moves application logic to the CDN edge. Dynamic personalization, A/B routing, and authentication checks can run at the edge without touching the origin.
103 Early Hints
103 Early Hints (RFC 8297) lets the server send Link headers *before* the final response is ready. The browser can start fetching critical subresources (fonts, CSS, JS) while the server is still generating the HTML response body:
HTTP/1.1 103 Early Hints
Link: </styles.css>; rel=preload; as=style
Link: </fonts/geist.woff2>; rel=preload; as=font; crossorigin
(server continues generating response...)
HTTP/1.1 200 OK
Content-Type: text/html
...
103 is particularly effective for server-rendered pages where the HTML generation takes 100–500 ms. Supported by Chrome, Firefox, and major CDNs. Nginx and Caddy support it natively; configure at the edge layer for maximum compatibility.
Measuring and Monitoring
Track TTFB with multiple tools to get a complete picture:
# curl: measure TTFB from different regions via VPN or proxy
curl -o /dev/null -s -w 'DNS: %{time_namelookup}s | Connect: %{time_connect}s | TLS: %{time_appconnect}s | TTFB: %{time_starttransfer}s\n' https://api.example.com/
In production, instrument with real user monitoring (RUM) — Cloudflare Web Analytics, Google CrUX, or Datadog RUM. Aggregate TTFB by country, device type, and endpoint to identify the highest-impact optimizations. Set alerts when p95 TTFB exceeds your SLO threshold.