Error Handling Patterns

Circuit Breaker Pattern for API Resilience

How the circuit breaker pattern protects your services from cascading failures by automatically stopping calls to struggling dependencies.

The Circuit Breaker Metaphor

An electrical circuit breaker trips when current exceeds a safe level, preventing fire. The software pattern works the same way: when calls to a dependency fail beyond a threshold, the circuit breaker trips and stops sending requests — giving the dependency time to recover.

Without a circuit breaker, every caller waits for the full timeout on each request to a failing service, exhausting thread pools and causing cascading failures across your entire system.

Three States

Closed (Normal Operation)

All requests flow through normally. The circuit breaker tracks recent failures. When the failure rate exceeds a threshold (e.g., 50% of requests in the last 10 seconds), it trips to Open.

Open (Failing Fast)

Requests are immediately rejected without calling the dependency. The circuit returns an error (or a cached fallback) instantly. After a configured timeout (e.g., 30 seconds), it moves to Half-Open.

Half-Open (Testing Recovery)

A limited number of probe requests are allowed through. If they succeed, the circuit moves back to Closed. If they fail, it returns to Open and resets the timeout.

CLOSED ──(failure threshold exceeded)──► OPEN
  ▲                                         │
  │                                   (timeout expires)
  │                                         ▼
  └──────(probes succeed)─────── HALF-OPEN
                (probes fail) ──► OPEN

When to Trip

Common tripping criteria:

  • Error rate: >50% of requests in a sliding window fail
  • Slow call rate: >50% of requests take longer than a threshold
  • Minimum call volume: Only trip after at least N calls (e.g., 10)

Count these as failures:

  • HTTP 5xx responses
  • Network errors (connection refused, timeout)
  • Responses slower than the slow-call threshold

Do not count 4xx as failures — those are client errors.

Recovery and Half-Open Testing

The Half-Open state is a controlled experiment. Send a small number of requests (typically 1–5) and observe results before fully reopening.

Be conservative: a single slow probe should not immediately reopen the circuit. Require a sustained success rate over N probes.

Circuit Breaker vs Retry

ConcernRetryCircuit Breaker
PurposeRecover from transient failuresStop calling broken services
ScopeIndividual requestAll requests to a service
When to useOccasional network hiccupsSustained dependency failure

They are complementary — use both. Retry handles transient blips. The circuit breaker handles sustained outages by failing fast.

Implementation with `pybreaker`

import pybreaker
import httpx

db_circuit = pybreaker.CircuitBreaker(
    fail_max=5,          # Trip after 5 failures
    reset_timeout=30,    # Half-open after 30 seconds
)

@db_circuit
def call_external_api(url: str) -> dict:
    response = httpx.get(url, timeout=5.0)
    response.raise_for_status()
    return response.json()

try:
    data = call_external_api('https://api.example.com/data')
except pybreaker.CircuitBreakerError:
    # Circuit is open — return cached fallback
    data = get_cached_data()

Observability

Monitor circuit breakers in production:

  • State transitions (Closed → Open, Open → Half-Open, etc.) — emit metrics and alerts on every state change
  • Rejection rate — what percentage of requests are being short-circuited
  • Recovery time — how long the circuit stays Open before recovering

Log every state transition with the triggering error rate and timestamps.

Summary

The circuit breaker pattern prevents cascading failures by detecting broken dependencies and failing fast. Implement all three states (Closed, Open, Half-Open), emit metrics on state transitions, and always pair with a fallback strategy for when the circuit is Open.

Giao thức liên quan

Thuật ngữ liên quan

Thêm trong Error Handling Patterns