Skip to main content

Streaming Responses with Server-Sent Events

Learn how to handle streaming responses from the V3 LLM endpoint using Server-Sent Events (SSE).

Overview

V3 requires mandatory streaming for all LLM responses. All chat completions are delivered via Server-Sent Events (SSE), providing real-time token-by-token output. Benefits:
  • Immediate response feedback
  • Better user experience with progressive display
  • Lower perceived latency

Server-Sent Events Format

SSE responses follow this pattern:
data: {JSON_CHUNK}

data: {JSON_CHUNK}

data: [DONE]
Each line starts with data: followed by JSON or the [DONE] marker.

Parsing Streaming Responses

Python with requests

JavaScript with Fetch API

Python with OpenAI SDK

Chunk Structure

Each JSON chunk follows OpenAI’s format:
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion.chunk",
  "created": 1677652288,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": "Hello"
      },
      "finish_reason": null
    }
  ]
}

Key Fields

FieldDescription
idUnique completion ID
createdUnix timestamp
modelModel used
choices[].delta.roleRole (only in first chunk)
choices[].delta.contentToken content
choices[].finish_reasonStop reason in final chunk (stop, length, etc.)

Handling Different Finish Reasons

Error Handling

Handle connection errors and timeouts:

Buffering for UI Display

Buffer tokens for smoother UI updates:

React/Frontend Integration

Example React hook for streaming:

Next Steps