Framework Cookbooks

gRPC Error Handling Best Practices

How to use gRPC status codes, rich error details, deadline propagation, and retry policies to build resilient gRPC services.

gRPC Status vs HTTP Status

gRPC uses its own status code system — not HTTP codes. While gRPC can transport over HTTP/2, the application layer uses google.rpc.Code values in the trailer metadata. HTTP/2 status codes (like 200 OK) only indicate that the transport layer succeeded; the actual RPC result is in the gRPC status.

This is a common source of confusion when building gRPC-HTTP transcoding layers or REST-to-gRPC gateways.

The 17 gRPC Status Codes

CodeNameWhen to Use
0`OK`Successful
1`CANCELLED`Client cancelled the request
2`UNKNOWN`Unclassified server-side error
3`INVALID_ARGUMENT`Client sent bad data (like 400)
4`DEADLINE_EXCEEDED`Timeout expired (like 504)
5`NOT_FOUND`Resource not found (like 404)
6`ALREADY_EXISTS`Conflict (like 409)
7`PERMISSION_DENIED`Authorized but not allowed (like 403)
8`RESOURCE_EXHAUSTED`Rate limited (like 429)
9`FAILED_PRECONDITION`System not in required state
10`ABORTED`Concurrency conflict (retry)
11`OUT_OF_RANGE`Iterator past valid range
12`UNIMPLEMENTED`Method not supported (like 501)
13`INTERNAL`Server bug (like 500)
14`UNAVAILABLE`Server temporarily unavailable (like 503)
15`DATA_LOSS`Unrecoverable data corruption
16`UNAUTHENTICATED`No valid credentials (like 401)

Return status codes in Python, Go, and Java:

# Python (grpcio)
import grpc

class UserService(pb2_grpc.UserServiceServicer):
    def GetUser(self, request, context):
        user = db.find(request.user_id)
        if not user:
            context.abort(
                grpc.StatusCode.NOT_FOUND,
                f'User {request.user_id} not found',
            )
        return pb2.UserResponse(user=user)
// Go (google.golang.org/grpc/status)
import (
    "google.golang.org/grpc/codes"
    "google.golang.org/grpc/status"
)

func (s *UserServer) GetUser(ctx context.Context, req *pb.GetUserRequest) (*pb.User, error) {
    user, err := s.db.Find(req.UserId)
    if err != nil {
        return nil, status.Errorf(codes.NotFound, "user %d not found", req.UserId)
    }
    return user, nil
}

Rich Error Details (google.rpc.Status)

The base gRPC status only carries a code and a message string. For richer error information, attach structured details via google.rpc.Status proto:

from google.rpc import status_pb2, error_details_pb2
from grpc_status import rpc_status

def GetUser(self, request, context):
    # Build rich error with field violations
    detail = error_details_pb2.BadRequest()
    violation = detail.field_violations.add()
    violation.field = 'user_id'
    violation.description = 'user_id must be positive'

    rich_status = status_pb2.Status(
        code=grpc.StatusCode.INVALID_ARGUMENT.value[0],
        message='Invalid request',
        details=[detail],
    )
    context.abort_with_status(rpc_status.to_status(rich_status))

Common detail types from google.rpc.error_details:

  • BadRequest — field-level validation errors
  • RetryInfo — tells client when to retry
  • QuotaFailure — which quota was exceeded
  • ErrorInfo — machine-readable error reason and domain
  • RequestInfo — request ID for support tracing

Error Propagation in Service Chains

In a microservice chain, propagate gRPC errors faithfully rather than wrapping them in generic INTERNAL errors:

func (s *OrderService) PlaceOrder(ctx context.Context, req *pb.OrderRequest) (*pb.Order, error) {
    // Call upstream inventory service
    _, err := s.inventoryClient.Reserve(ctx, &inventorypb.ReserveRequest{
        ItemId: req.ItemId,
        Qty:    req.Quantity,
    })
    if err != nil {
        // Propagate upstream gRPC status directly — don't wrap in INTERNAL
        return nil, err
    }
    order := s.db.CreateOrder(req)
    return order, nil
}

Deadline Propagation

gRPC deadlines propagate through context automatically. Always pass ctx to downstream calls to ensure the overall request budget is respected:

func (s *Gateway) HandleRequest(ctx context.Context, req *pb.Request) (*pb.Response, error) {
    // ctx carries the original deadline — pass it through
    userRes, err := s.userService.Get(ctx, &userpb.GetRequest{Id: req.UserId})
    if err != nil {
        if status.Code(err) == codes.DeadlineExceeded {
            // Log and propagate — do not restart the deadline
            log.Warn("upstream deadline exceeded")
        }
        return nil, err
    }
    return &pb.Response{User: userRes.User}, nil
}

Retry Policies in gRPC

Configure retry policies in the service config JSON (applied client-side):

{
  "methodConfig": [{
    "name": [{"service": "UserService"}],
    "retryPolicy": {
      "maxAttempts": 3,
      "initialBackoff": "0.5s",
      "maxBackoff": "5s",
      "backoffMultiplier": 2.0,
      "retryableStatusCodes": ["UNAVAILABLE", "RESOURCE_EXHAUSTED"]
    }
  }]
}

Only retry on transient codes: UNAVAILABLE, RESOURCE_EXHAUSTED. Never retry INVALID_ARGUMENT, NOT_FOUND, or PERMISSION_DENIED — these are deterministic failures that will not improve on retry.

Protocolos relacionados

Termos do glossário relacionados

Mais em Framework Cookbooks