Production Infrastructure

Load Balancer Health Checks: HTTP, TCP, and gRPC Probes

Designing health check endpoints for load balancers — HTTP status code expectations, deep vs shallow checks, graceful shutdown signaling, and multi-protocol probes.

Why Health Checks Matter

Load balancers route traffic to backend instances only when those instances are healthy. A misconfigured or missing health check means the load balancer will happily send requests to a server whose database connection pool is exhausted, whose disk is full, or that is in the middle of shutting down.

Health checks are your primary mechanism for automatic failure isolation — they let the load balancer detect a failed instance and stop sending traffic to it before your users notice.

Health Check Types

TCP Connect Check

The simplest check: can the load balancer open a TCP connection to the backend port?

Load Balancer → TCP SYN → Backend:8000
Backend       → TCP SYN-ACK → (pass)

TCP checks verify the process is running and listening, but nothing more. A Django app that has exhausted its database connection pool will still pass a TCP check on port 8000. Use TCP checks only as a last resort when HTTP health endpoints are not possible.

HTTP Health Check

The standard production approach. The load balancer sends an HTTP GET request and expects a specific status code (almost always 200 OK):

GET /healthz HTTP/1.1
Host: 127.0.0.1

HTTP/1.1 200 OK
Content-Type: application/json

{"status": "ok"}

AWS ALB configuration:

{
  "HealthCheckProtocol": "HTTP",
  "HealthCheckPath": "/healthz",
  "HealthCheckIntervalSeconds": 10,
  "HealthyThresholdCount": 2,
  "UnhealthyThresholdCount": 3,
  "HealthCheckTimeoutSeconds": 5,
  "Matcher": {"HttpCode": "200"}
}

gRPC Health Check

gRPC services implement the standard [gRPC Health Checking Protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md). The load balancer calls grpc.health.v1.Health/Check:

# Python gRPC health service
from grpc_health.v1 import health, health_pb2, health_pb2_grpc

health_servicer = health.HealthServicer()
health_pb2_grpc.add_HealthServicer_to_server(health_servicer, server)

# Mark service as serving
health_servicer.set(
    'mypackage.MyService',
    health_pb2.HealthCheckResponse.SERVING
)

# Mark as not serving (triggers load balancer failover)
health_servicer.set(
    'mypackage.MyService',
    health_pb2.HealthCheckResponse.NOT_SERVING
)

AWS ALB supports gRPC health checks natively when the target group protocol version is gRPC.

Shallow vs Deep Health Checks

This is the most important design decision in health check architecture.

Shallow Check — `/healthz` (Process Alive)

Returns 200 if the application process is running and can handle requests. Does not check database connectivity, cache availability, or downstream services.

# Django view — shallow liveness check
from django.http import JsonResponse

def healthz(request):
    """Liveness probe — is this process alive?"""
    return JsonResponse({"status": "ok"})

Use shallow checks for liveness probes — the load balancer needs to know whether to restart/replace this instance, not whether its dependencies are healthy.

Deep Check — `/readyz` (Dependencies Ready)

Verifies all dependencies are reachable before marking the instance ready to serve traffic:

# Django view — deep readiness check
import time
from django.db import connections
from django.core.cache import cache
from django.http import JsonResponse

def readyz(request):
    """Readiness probe — are all dependencies available?"""
    checks = {}
    status = 200

    # Database check
    try:
        connections['default'].ensure_connection()
        checks['database'] = 'ok'
    except Exception as e:
        checks['database'] = str(e)
        status = 503

    # Cache check
    try:
        cache.set('health_check', '1', timeout=5)
        assert cache.get('health_check') == '1'
        checks['cache'] = 'ok'
    except Exception as e:
        checks['cache'] = str(e)
        status = 503

    return JsonResponse(
        {"status": "ok" if status == 200 else "degraded", "checks": checks},
        status=status
    )

Kubernetes Probe Separation

Kubernetes distinguishes three probe types — use the right endpoint for each:

containers:
  - name: app
    livenessProbe:
      httpGet:
        path: /healthz    # Shallow — restart if dead
        port: 8000
      initialDelaySeconds: 10
      periodSeconds: 10
      failureThreshold: 3
    readinessProbe:
      httpGet:
        path: /readyz     # Deep — remove from LB if not ready
        port: 8000
      initialDelaySeconds: 5
      periodSeconds: 5
      failureThreshold: 2
    startupProbe:
      httpGet:
        path: /healthz    # Allow slow startup before liveness kicks in
        port: 8000
      failureThreshold: 30
      periodSeconds: 10

Never point livenessProbe at a deep check. If your database goes down, a liveness probe returning 503 causes Kubernetes to restart all pods simultaneously, making an outage catastrophic instead of partial.

Response Design

Health check responses should be consistent and machine-parseable:

// Healthy — HTTP 200
{
  "status": "ok",
  "version": "v2026.2.14.1",
  "uptime_seconds": 3842,
  "checks": {
    "database": "ok",
    "cache": "ok",
    "queue": "ok"
  }
}

// Unhealthy — HTTP 503
{
  "status": "degraded",
  "checks": {
    "database": "connection refused: 127.0.0.1:5432",
    "cache": "ok",
    "queue": "ok"
  }
}

Return 200 for healthy and 503 for unhealthy. Some teams use 200 with a degraded body — but load balancers check the status code, not the body, so returning 200 for an unhealthy instance defeats the purpose.

Graceful Shutdown Signaling

When a deployment or scaling event terminates an instance, the health check is your mechanism to signal the load balancer to stop sending new traffic:

import signal
import threading

# Global flag — set to True when shutdown begins
_shutting_down = threading.Event()

def handle_sigterm(signum, frame):
    _shutting_down.set()

signal.signal(signal.SIGTERM, handle_sigterm)

def readyz(request):
    if _shutting_down.is_set():
        # Tell load balancer we are draining — stop sending new requests
        return JsonResponse({"status": "shutting_down"}, status=503)
    # ... normal readiness checks

The sequence during a graceful shutdown:

  • SIGTERM received → set _shutting_down flag
  • /readyz starts returning 503
  • Load balancer health check fails → stops routing new requests to this instance
  • In-flight requests complete
  • Process exits cleanly

AWS ALB deregistration delay (default 300s) provides a buffer — the instance continues receiving traffic already in-flight during this window even after deregistration begins.

Related Protocols

Related Glossary Terms

More in Production Infrastructure