Skip to main content
Learn how to handle streaming responses from the V3 LLM endpoint using Server-Sent Events (SSE).

Overview

When streaming is enabled in V3, LLM responses are delivered via Server-Sent Events (SSE), providing real-time token-by-token output. Benefits:
  • Immediate response feedback
  • Better user experience with progressive display
  • Lower perceived latency

Server-Sent Events Format

SSE responses follow this pattern:
data: {JSON_CHUNK}

data: {JSON_CHUNK}

data: [DONE]
Each line starts with data: followed by JSON or the [DONE] marker.

Python with OpenAI SDK

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_EDEN_AI_API_KEY",
    base_url="https://api.edenai.run/v3/llm"
)

stream = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Tell me a short story"}],
    stream=True
)

full_content = ""

for chunk in stream:
    if chunk.choices[0].delta.content:
        content = chunk.choices[0].delta.content
        full_content += content
        print(content, end='', flush=True)

print(f"\n\nComplete response: {full_content}")

JavaScript with Fetch API

const url = 'https://api.edenai.run/v3/llm/chat/completions';
const headers = {
  'Authorization': 'Bearer YOUR_API_KEY',
  'Content-Type': 'application/json'
};

const payload = {
  model: 'openai/gpt-4',
  messages: [{role: 'user', content: 'Tell me a short story'}],
  stream: true
};

const response = await fetch(url, {
  method: 'POST',
  headers: headers,
  body: JSON.stringify(payload)
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';

while (true) {
  const {done, value} = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, {stream: true});
  const lines = buffer.split('\n');
  buffer = lines.pop(); // Keep incomplete line in buffer

  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const data = line.slice(6);

      if (data === '[DONE]') {
        console.log('\nStream finished');
        break;
      }

      try {
        const chunk = JSON.parse(data);
        const content = chunk.choices[0]?.delta?.content;
        if (content) {
          process.stdout.write(content);
        }
      } catch (e) {
        // Ignore parse errors
      }
    }
  }
}

Chunk Structure

Each JSON chunk follows OpenAI’s format:
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion.chunk",
  "created": 1677652288,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": "Hello"
      },
      "finish_reason": null
    }
  ]
}

Key Fields

FieldDescription
idUnique completion ID
createdUnix timestamp
modelModel used
choices[].delta.roleRole (only in first chunk)
choices[].delta.contentToken content
choices[].finish_reasonStop reason in final chunk

Finish Reasons

ValueMeaning
stopCompleted normally
lengthStopped at max_tokens limit
content_filterStopped by content policy
tool_callsStopped to invoke a tool

Next Steps