Debugging & Troubleshooting

Debugging gRPC Errors and Status Codes

How to diagnose gRPC errors using status codes, distinguish UNAVAILABLE from INTERNAL, debug deadline exceeded, and trace with grpcurl.

gRPC Error Model

Unlike HTTP which uses numeric status codes, gRPC defines a fixed set of 16 status codes that apply across all transports. Every gRPC call completes with a status code and an optional message string. Understanding these codes is the foundation of gRPC debugging.

gRPC errors are surfaced differently per language:

# Python
import grpc

try:
    response = stub.GetUser(request)
except grpc.RpcError as e:
    print(e.code())     # grpc.StatusCode.NOT_FOUND
    print(e.details())  # 'User 42 not found'

Status Code Reference

CodeNameHTTP EquivalentMeaning
0`OK`200Success
1`CANCELLED`Client cancelled the request
2`UNKNOWN`500Unexpected error
3`INVALID_ARGUMENT`400Bad input
4`DEADLINE_EXCEEDED`504Timeout expired
5`NOT_FOUND`404Resource not found
6`ALREADY_EXISTS`409Conflict
7`PERMISSION_DENIED`403Forbidden
8`RESOURCE_EXHAUSTED`429Rate limit / quota
9`FAILED_PRECONDITION`400System not in required state
10`ABORTED`409Concurrency conflict
13`INTERNAL`500Server-side bug
14`UNAVAILABLE`503Server temporarily unavailable
16`UNAUTHENTICATED`401Missing or invalid credentials

UNAVAILABLE vs INTERNAL

These two are the most commonly confused:

UNAVAILABLE (14) — the server cannot be reached or is temporarily overwhelmed. It is safe to retry with backoff. Common causes: server is starting up, overloaded, or a network partition is in progress.

INTERNAL (13) — a bug or unexpected condition in the server code. It is not safe to retry automatically without investigation. The same request will likely produce the same error.

gRPC client libraries automatically retry UNAVAILABLE when configured with a service config. Do not retry INTERNAL.

Deadline Exceeded Debugging

DEADLINE_EXCEEDED means the deadline set by the caller expired before the RPC completed. The deadline propagates through the call chain — if a client sets a 500ms deadline and calls Service A which calls Service B, Service B also has at most 500ms (minus A's processing time).

# Set a deadline per call
response = stub.GetUser(request, timeout=0.5)  # 500ms

Debugging checklist:

  • Log the deadline remaining at each service hop
  • Add distributed tracing (OpenTelemetry) to identify which service consumed the most time
  • Check database query times — a slow DB query is the most common cause
  • Use grpc.StatusCode.DEADLINE_EXCEEDED vs grpc.StatusCode.CANCELLED — CANCELLED means the client gave up before the deadline

Channel and Connection Issues

gRPC uses HTTP/2, which multiplexes many RPCs over a single TCP connection. Connection issues affect all in-flight RPCs simultaneously.

# Check channel connectivity state
channel = grpc.insecure_channel('localhost:50051')
state = channel.check_connectivity_state(try_to_connect=True)
# States: IDLE, CONNECTING, READY, TRANSIENT_FAILURE, SHUTDOWN

TRANSIENT_FAILURE — connection attempt failed, will retry. This is normal during startup; problematic if it persists.

grpcurl for Testing

grpcurl is the curl equivalent for gRPC:

# List services (requires server reflection)
grpcurl -plaintext localhost:50051 list

# Describe a service
grpcurl -plaintext localhost:50051 describe UserService

# Make a unary call
grpcurl -plaintext -d '{"id": "42"}' \
  localhost:50051 UserService/GetUser

# With TLS and metadata
grpcurl -H 'Authorization: Bearer TOKEN' \
  -d '{"id": "42"}' api.example.com:443 UserService/GetUser

Distributed Tracing for gRPC

gRPC integrates natively with OpenTelemetry. Add the gRPC instrumentation interceptor to capture every RPC as a trace span:

from opentelemetry.instrumentation.grpc import GrpcInstrumentorClient

GrpcInstrumentorClient().instrument()
# All subsequent stub calls are automatically traced

In your tracing UI (Jaeger, Zipkin, Grafana Tempo), filter by rpc.grpc.status_code != 0 to find failed RPCs quickly.

Verwandte Protokolle

Verwandte Glossarbegriffe

Mehr in Debugging & Troubleshooting