Why Health Checks Matter
Health check endpoints serve multiple critical purposes:
- Load balancers route traffic away from unhealthy instances
- Kubernetes restarts containers that fail liveness probes
- Monitoring systems (UptimeRobot, Datadog) alert on downtime
- Deployment pipelines verify a new version is ready before sending traffic
A poorly designed health check — one that always returns 200 regardless of actual state — defeats all of the above.
Liveness vs Readiness vs Startup
Kubernetes defines three distinct probe types, and the distinction matters even outside Kubernetes:
Liveness
GET /health/live — Is the process alive and not in a deadlock?
This probe should be extremely lightweight: check that the event loop is running and the process can respond at all. If this fails, the container is killed and restarted.
Do not check external dependencies in liveness — a database outage should not cause your containers to restart in a loop.
Readiness
GET /health/ready — Is the service ready to receive traffic?
This probe should check all dependencies the service needs to function:
- Database connection and query execution
- Cache connectivity (Redis, Memcached)
- Required external APIs
- Configuration loaded
If readiness fails, the load balancer stops routing traffic to this instance — but does not restart it.
Startup
GET /health/startup — Has the application finished initializing?
For slow-starting services (loading large ML models, running migrations), the startup probe buys time before liveness kicks in. Once the startup probe succeeds, liveness and readiness probes begin.
What to Check in Readiness
Database
from django.db import connection
def check_database() -> tuple[bool, str]:
try:
connection.ensure_connection()
with connection.cursor() as cursor:
cursor.execute('SELECT 1')
return True, 'ok'
except Exception as e:
return False, str(e)
Cache
from django.core.cache import cache
def check_cache() -> tuple[bool, str]:
try:
cache.set('health_check', '1', timeout=10)
assert cache.get('health_check') == '1'
return True, 'ok'
except Exception as e:
return False, str(e)
External Services
Only check truly required external services in readiness. Optional services (email sending, analytics) should not affect readiness — use the graceful degradation pattern instead.
Response Format
The Health Check Response Format is standardized in [draft-inadarei-api-health-check](https://inadarei.github.io/rfc-healthcheck/):
{
"status": "pass",
"version": "1.0.0",
"releaseId": "v2026.2.25.1",
"checks": {
"database": [{
"status": "pass",
"responseTime": 12
}],
"cache": [{
"status": "pass",
"responseTime": 3
}]
}
}
Status values: pass (healthy), fail (unhealthy), warn (degraded but functional).
HTTP status codes:
| Health Status | HTTP Code |
|---|---|
| `pass` | 200 |
| `warn` | 200 |
| `fail` | 503 |
Security Considerations
Health endpoints can leak sensitive information:
- Liveness (
/health/live): Safe to expose publicly — returns minimal information - Readiness (
/health/ready): May leak internal topology (database host, service names). Restrict to internal network or require authentication - Never include: credentials, connection strings, stack traces in health responses
Anti-Patterns
- Always returning 200: Defeats the purpose entirely
- Checking non-required dependencies: A broken recommendation service should not make your product page fail readiness
- Long-running checks: Health checks should complete in <100ms; use timeouts on dependency checks
- No health check at all: Load balancers will route to dead instances indefinitely
- Same endpoint for liveness and readiness: A database outage should stop traffic, not restart all containers
Complete Django Example
# apps/core/views.py
import time
from django.http import JsonResponse
from django.db import connection
def health_live(request):
return JsonResponse({'status': 'pass'}, status=200)
def health_ready(request):
checks = {}
overall = 'pass'
# Database check
t0 = time.monotonic()
try:
connection.ensure_connection()
checks['database'] = [{'status': 'pass',
'responseTime': int((time.monotonic()-t0)*1000)}]
except Exception as e:
checks['database'] = [{'status': 'fail', 'output': str(e)}]
overall = 'fail'
status_code = 503 if overall == 'fail' else 200
return JsonResponse(
{'status': overall, 'checks': checks},
status=status_code,
)
Summary
Design three separate health check endpoints: liveness (is it alive?), readiness (is it ready for traffic?), and startup (has it initialized?). Only check required dependencies in readiness, return structured JSON with pass/warn/fail statuses, restrict detailed checks to internal networks, and keep all health checks under 100ms.