What Is Graceful Degradation?
Graceful degradation means your application continues to function — even if in a limited capacity — when one or more of its dependencies are unavailable. The alternative is total failure: one broken microservice takes down your entire product.
The goal is not perfection under failure. The goal is to give users something useful rather than an error page.
Fallback Strategies
1. Cached Data Fallback
Return the last successful response when a live fetch fails:
import redis
import httpx
cache = redis.Redis()
def get_product(product_id: int) -> dict:
cache_key = f'product:{product_id}'
try:
data = httpx.get(
f'https://inventory.internal/products/{product_id}',
timeout=2.0,
).json()
cache.setex(cache_key, 300, json.dumps(data))
return data
except Exception:
cached = cache.get(cache_key)
if cached:
return json.loads(cached) # Stale but better than nothing
return {'id': product_id, 'available': False} # Minimal fallback
2. Default / Empty State Fallback
Return a safe default when the dependency is unavailable:
- Recommendations engine down → show top-selling items
- Personalization service down → show generic homepage
- Search service down → disable search, show category browse
3. Functional Subset
Identify the core user journey and protect it. Non-essential features can be degraded independently:
- Payment service works even if loyalty points service is down
- Product pages render even if review service is down
- Checkout works even if recommendation service is down
Feature Flags
Feature flags let you disable non-essential features at runtime without deploying code:
from django.conf import settings
def render_product_page(request, product_id):
product = get_product(product_id)
reviews = []
if settings.FEATURE_REVIEWS_ENABLED:
try:
reviews = get_reviews(product_id)
except Exception:
pass # Degrade silently — reviews are non-core
return render(request, 'product.html', {
'product': product,
'reviews': reviews,
})
When get_reviews() becomes unreliable, flip FEATURE_REVIEWS_ENABLED to False in your feature flag system — no deployment needed.
Serving Stale Data
Stale data is almost always better than an error. Use stale-while-revalidate in HTTP caching to serve cached responses while fetching fresh ones asynchronously:
Cache-Control: max-age=60, stale-while-revalidate=3600
In your application layer, use stale-if-error:
Cache-Control: max-age=60, stale-if-error=86400
This tells CDNs and caches to serve stale content for up to 24 hours if the origin returns a 5xx error.
Read-Only Mode
When your database or write services are degraded, switch to read-only mode:
- Disable all write endpoints (return 503 with
Retry-After) - Continue serving reads from replicas or cache
- Display a user-visible banner: 'Our system is experiencing issues. Browse and search are available; checkout is temporarily unavailable.'
Monitoring Degradation
Track degradation in your metrics:
- Fallback rate:
fallbacks_served / total_requests— alert when >1% - Feature flag overrides: Log when features are disabled
- Stale cache hits: How often are you serving stale data?
Degradation should be temporary and visible, not a silent permanent state.
Summary
Graceful degradation requires deliberate design: identify your core user journey, implement fallbacks for every non-core dependency, use feature flags for quick disables, serve stale data rather than errors, and monitor your fallback rates to know when degradation occurs.