Streaming Responses with Server-Sent Events
Learn how to handle streaming responses from the V3 LLM endpoint using Server-Sent Events (SSE).Overview
V3 requires mandatory streaming for all LLM responses. All chat completions are delivered via Server-Sent Events (SSE), providing real-time token-by-token output. Benefits:- Immediate response feedback
- Better user experience with progressive display
- Lower perceived latency
Server-Sent Events Format
SSE responses follow this pattern:data: followed by JSON or the [DONE] marker.
Parsing Streaming Responses
Python with requests
JavaScript with Fetch API
Python with OpenAI SDK
Chunk Structure
Each JSON chunk follows OpenAI’s format:Key Fields
| Field | Description |
|---|---|
id | Unique completion ID |
created | Unix timestamp |
model | Model used |
choices[].delta.role | Role (only in first chunk) |
choices[].delta.content | Token content |
choices[].finish_reason | Stop reason in final chunk (stop, length, etc.) |
Handling Different Finish Reasons
Error Handling
Handle connection errors and timeouts:Buffering for UI Display
Buffer tokens for smoother UI updates:React/Frontend Integration
Example React hook for streaming:Next Steps
- Chat Completions - Basic chat setup
- File Attachments - Send images to LLMs
- Getting Started - V3 basics