gRPC Streaming vs REST Polling: Real-Time Data Delivery

The Real-Time Spectrum

There is no single 'real-time' solution — there is a spectrum of approaches, each with different latency, complexity, and infrastructure requirements:

Approach	Latency	Client complexity	Server complexity
Short polling	High (interval)	Low	Low
Long polling	Medium	Medium	Medium
SSE	Low	Low	Low
WebSocket	Low	Medium	Medium
gRPC server streaming	Low	Low (generated code)	Medium
gRPC bidirectional	Very low	Medium	High

Short and Long Polling

Short polling — the client sends a request every N seconds. Simple, but wasteful: most responses contain no new data. At 1-second poll intervals with 100K clients, you serve 100K requests/second of mostly empty responses.

Long polling — the server holds the request open until new data arrives, then responds. The client immediately reconnects. Latency matches the event itself (sub-second), but each 'event' requires a full HTTP round trip:

Client                         Server
  |--- GET /events?after=42 --->|  (server holds open)
  |                             |  ... data arrives ...
  |<--- 200 {event: 43} --------|  (immediately respond)
  |--- GET /events?after=43 --->|  (client immediately reconnects)

gRPC Server Streaming

gRPC server streaming is defined in a .proto file with the stream keyword on the response:

syntax = "proto3";

service MarketData {
  // Unary: client sends one request, server sends one response
  rpc GetQuote (QuoteRequest) returns (Quote);

  // Server streaming: client sends one request, server sends many
  rpc SubscribeQuotes (SubscribeRequest) returns (stream Quote);

  // Bidirectional: both sides stream
  rpc Trade (stream TradeOrder) returns (stream TradeConfirmation);
}

message SubscribeRequest { repeated string symbols = 1; }
message Quote {
  string symbol = 1;
  double bid = 2;
  double ask = 3;
  int64 timestamp_ms = 4;
}

The server-side implementation in Python (grpcio):

import grpc
import market_data_pb2
import market_data_pb2_grpc
from collections.abc import Iterator

class MarketDataServicer(market_data_pb2_grpc.MarketDataServicer):
    def SubscribeQuotes(
        self,
        request: market_data_pb2.SubscribeRequest,
        context: grpc.ServicerContext,
    ) -> Iterator[market_data_pb2.Quote]:
        for symbol in request.symbols:
            subscribe_to_feed(symbol)

        try:
            while not context.is_active() is False:
                for symbol in request.symbols:
                    quote = get_latest_quote(symbol)
                    yield market_data_pb2.Quote(
                        symbol=symbol,
                        bid=quote.bid,
                        ask=quote.ask,
                        timestamp_ms=quote.timestamp_ms,
                    )
                time.sleep(0.1)  # 10Hz updates
        except grpc.RpcError:
            pass  # Client disconnected

Flow Control and Deadlines

gRPC's HTTP/2 foundation provides built-in flow control. If the client cannot consume messages fast enough, the server-side yield will block — preventing unbounded memory growth from buffered messages.

Deadlines propagate through the call chain. If a client sets a 5-second deadline on a streaming call, cancellation propagates to all downstream RPCs the server makes, enabling clean resource cleanup:

# Client with deadline and metadata
with grpc.insecure_channel('localhost:50051') as channel:
    stub = market_data_pb2_grpc.MarketDataStub(channel)
    request = market_data_pb2.SubscribeRequest(symbols=['BTC', 'ETH'])
    # Stream for 30 seconds, then deadline fires
    for quote in stub.SubscribeQuotes(request, timeout=30):
        print(f'{quote.symbol}: {quote.bid}/{quote.ask}')

gRPC Bidirectional Streaming

Bidirectional streaming enables both sides to send messages independently over the same HTTP/2 stream. This is ideal for:

Collaborative editing — clients stream cursor positions and operations; server streams merged state back.

Interactive AI inference — client streams audio chunks; server streams back transcription fragments as they are recognized.

Real-time sync — device telemetry streams up while configuration changes stream down on the same connection.

# Bidirectional streaming client
def generate_orders():
    for order in order_queue:
        yield market_data_pb2.TradeOrder(
            symbol=order.symbol,
            quantity=order.quantity,
            side=order.side,
        )

with grpc.insecure_channel('trading:50051') as channel:
    stub = market_data_pb2_grpc.MarketDataStub(channel)
    for confirmation in stub.Trade(generate_orders()):
        print(f'Order {confirmation.order_id}: {confirmation.status}')

REST Alternatives

Webhooks — Push Without Polling

For service-to-service communication, webhooks invert the connection: instead of the consumer polling the producer, the producer HTTP POSTs to the consumer when events occur. This eliminates polling entirely:

Producer                      Consumer
  |--- POST /webhooks/payment -->|  (event fires)
  |<--- 200 OK ------------------|  (acknowledge)

Webhooks work well for low-frequency, high-importance events (payment confirmed, order shipped). They fail for high-frequency data (sub-second market data) because each event requires a new HTTP connection.

Conditional GET with ETag/304

For resources that change infrequently, conditional GET eliminates response bodies on cache hits:

GET /api/config HTTP/1.1
If-None-Match: "v42"

HTTP/1.1 304 Not Modified  (no body — config unchanged)

This is not truly real-time, but it reduces bandwidth from polling to near-zero when data is stable.

SSE for Browser Clients

When your consumer is a browser and data only flows server → client, SSE beats gRPC: browsers have native EventSource support, no code generation is required, and the connection survives through HTTP/2 proxies transparently.

gRPC-Web (the browser-compatible gRPC variant) requires a proxy (Envoy or grpc-web server) to translate between browser HTTP/1.1 and gRPC's HTTP/2 framing. This adds operational complexity that SSE avoids.

Architecture Decision Guide

Use gRPC streaming when:

Both client and server are non-browser services (microservices, native apps)
You need strongly-typed streaming with protobuf schemas
Latency must be sub-100ms and data volume is high
You need deadline propagation across a call chain
Bidirectional streaming is required

Use SSE when:

The consumer is a browser
Data flows only server → client
You want zero client-side code generation
Built-in reconnection with Last-Event-ID matters

Use REST polling when:

Events are infrequent (once per minute or less)
Infrastructure does not support persistent connections (serverless)
Simplicity outweighs latency requirements

Use webhooks when:

You are integrating with third-party services that push events
Events are asynchronous and the consumer controls its own endpoint
You need delivery guarantees with retry logic on the producer side