Skip to main content

Working with Media Files

Send images, documents, and other media files to LLMs for analysis and understanding.

Overview

Eden AI V3 LLM endpoints support multimodal inputs, allowing you to send:
  • Images - For visual understanding and analysis
  • Documents - PDFs and text files for processing
  • Mixed content - Combine text prompts with media
Multimodal capabilities enable use cases like:
  • Analyzing screenshots and diagrams
  • Extracting data from images and documents
  • Visual question answering
  • Chart and graph interpretation
  • Receipt and invoice processing

Supported Input Types

V3 supports multiple ways to send media to LLMs:
Input TypeFormatBest ForExample
HTTP(S) URLDirect linkPublicly accessible fileshttps://example.com/image.jpg
Base64 Data URLInline encoded dataSmall files, secure datadata:image/jpeg;base64,...
File UploadUUID from /v3/uploadReusable files, large files550e8400-e29b-...
Base64 File DataRaw base64 or data URLPDFs, documentsdata:application/pdf;base64,...

Image Inputs

Using Image URLs

The simplest method for publicly accessible images:
import requests

url = "https://api.edenai.run/v3/llm/chat/completions"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}

payload = {
    "model": "openai/gpt-4o",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What's in this image?"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/photo.jpg"
                    }
                }
            ]
        }
    ],
    "stream": True
}

response = requests.post(url, headers=headers, json=payload, stream=True)

for line in response.iter_lines():
    if line:
        print(line.decode('utf-8'))

Using Base64 Image Data

For inline images or when URLs aren’t available:
import base64
import requests

# Read and encode image
with open("image.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode('utf-8')

# Create data URL
data_url = f"data:image/jpeg;base64,{image_data}"

payload = {
    "model": "anthropic/claude-3-5-sonnet-20241022",
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this image in detail."},
                {
                    "type": "image_url",
                    "image_url": {"url": data_url}
                }
            ]
        }
    ],
    "stream": True
}

response = requests.post(url, headers=headers, json=payload, stream=True)

Using Uploaded Files

For reusable images or better performance:
import requests

# Step 1: Upload the image
upload_url = "https://api.edenai.run/v3/upload"
upload_headers = {"Authorization": "Bearer YOUR_API_KEY"}

files = {"file": open("screenshot.png", "rb")}
upload_response = requests.post(upload_url, headers=upload_headers, files=files)
file_id = upload_response.json()["file_id"]

print(f"Uploaded file ID: {file_id}")

# Step 2: Use the file in LLM request
llm_url = "https://api.edenai.run/v3/llm/chat/completions"
llm_headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}

payload = {
    "model": "google/gemini-1.5-pro",
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Analyze this screenshot and list all UI elements."},
                {
                    "type": "file",
                    "file": {"file_id": file_id}
                }
            ]
        }
    ],
    "stream": True
}

response = requests.post(llm_url, headers=llm_headers, json=payload, stream=True)

for line in response.iter_lines():
    if line:
        print(line.decode('utf-8'))

Document Inputs

PDF and Document Files

Send PDFs and documents for analysis:
import requests

# Upload PDF document
upload_url = "https://api.edenai.run/v3/upload"
upload_headers = {"Authorization": "Bearer YOUR_API_KEY"}

files = {"file": open("report.pdf", "rb")}
data = {"purpose": "llm-analysis"}

upload_response = requests.post(
    upload_url,
    headers=upload_headers,
    files=files,
    data=data
)
file_id = upload_response.json()["file_id"]

# Analyze the PDF with LLM
llm_url = "https://api.edenai.run/v3/llm/chat/completions"
llm_headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}

payload = {
    "model": "anthropic/claude-3-5-sonnet-20241022",
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Summarize this document and extract key findings."},
                {
                    "type": "file",
                    "file": {"file_id": file_id}
                }
            ]
        }
    ],
    "stream": True
}

response = requests.post(llm_url, headers=llm_headers, json=payload, stream=True)

for line in response.iter_lines():
    if line:
        print(line.decode('utf-8'))

Base64 Document Data

For inline document processing:
import base64
import requests

# Read and encode PDF
with open("invoice.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode('utf-8')

# Create data URL for PDF
data_url = f"data:application/pdf;base64,{pdf_data}"

payload = {
    "model": "openai/gpt-4o",
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Extract all line items and totals from this invoice."},
                {
                    "type": "file",
                    "file": {"file_data": data_url}
                }
            ]
        }
    ],
    "stream": True
}

response = requests.post(url, headers=headers, json=payload, stream=True)

Mixed Content Messages

Multiple Images

Send multiple images in a single message:
import requests
payload = {
    "model": "openai/gpt-4o",
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Compare these two images and describe the differences."},
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/before.jpg"}
                },
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/after.jpg"}
                }
            ]
        }
    ],
    "stream": True
}

response = requests.post(url, headers=headers, json=payload, stream=True)

Text + Images + Documents

Combine different media types:
import requests
payload = {
    "model": "anthropic/claude-3-5-sonnet-20241022",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Review the chart and supporting documentation. Provide analysis."
                },
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/chart.png"}
                },
                {
                    "type": "file",
                    "file": {"file_id": "550e8400-e29b-41d4-a716-446655440000"}
                }
            ]
        }
    ],
    "stream": True
}

response = requests.post(url, headers=headers, json=payload, stream=True)

Practical Examples

Analyze a Screenshot

import requests

url = "https://api.edenai.run/v3/llm/chat/completions"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}

payload = {
    "model": "openai/gpt-4o",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "This is a screenshot of an error message. What's wrong and how do I fix it?"
                },
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/error-screenshot.png"}
                }
            ]
        }
    ],
    "stream": True,
    "max_tokens": 500
}

response = requests.post(url, headers=headers, json=payload, stream=True)

for line in response.iter_lines():
    if line:
        line_str = line.decode('utf-8')
        if line_str.startswith('data: '):
            data = line_str[6:]
            if data != '[DONE]':
                print(data)

Extract Receipt Data

import requests
import base64

# Read receipt image
with open("receipt.jpg", "rb") as f:
    image_data = base64.b64encode(f.read()).decode('utf-8')

url = "https://api.edenai.run/v3/llm/chat/completions"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}

payload = {
    "model": "anthropic/claude-3-5-sonnet-20241022",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Extract the following from this receipt: merchant name, date, total amount, items purchased. Format as JSON."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{image_data}"
                    }
                }
            ]
        }
    ],
    "stream": True,
    "temperature": 0.2  # Lower temperature for structured extraction
}

response = requests.post(url, headers=headers, json=payload, stream=True)

full_response = ""
for line in response.iter_lines():
    if line:
        line_str = line.decode('utf-8')
        if line_str.startswith('data: '):
            data = line_str[6:]
            if data != '[DONE]':
                # Parse and accumulate response
                import json
                chunk = json.loads(data)
                if chunk.get('choices', [{}])[0].get('delta', {}).get('content'):
                    full_response += chunk['choices'][0]['delta']['content']

print("Extracted data:", full_response)

Summarize PDF Document

import requests

# Upload PDF
upload_url = "https://api.edenai.run/v3/upload"
upload_headers = {"Authorization": "Bearer YOUR_API_KEY"}

files = {"file": open("research-paper.pdf", "rb")}
upload_response = requests.post(upload_url, headers=upload_headers, files=files)
file_id = upload_response.json()["file_id"]

# Request summary
llm_url = "https://api.edenai.run/v3/llm/chat/completions"
llm_headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}

payload = {
    "model": "google/gemini-1.5-pro",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Provide a comprehensive summary of this research paper, including methodology, key findings, and conclusions."
                },
                {
                    "type": "file",
                    "file": {"file_id": file_id}
                }
            ]
        }
    ],
    "stream": True,
    "max_tokens": 1000
}

response = requests.post(llm_url, headers=llm_headers, json=payload, stream=True)

for line in response.iter_lines():
    if line:
        print(line.decode('utf-8'))

Chart Analysis

import requests

url = "https://api.edenai.run/v3/llm/chat/completions"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}

payload = {
    "model": "openai/gpt-4o",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Analyze this chart and provide: 1) Main trends, 2) Notable outliers, 3) Key insights, 4) Recommendations"
                },
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/sales-chart.png"}
                }
            ]
        }
    ],
    "stream": True,
    "temperature": 0.3
}

response = requests.post(url, headers=headers, json=payload, stream=True)

for line in response.iter_lines():
    if line:
        print(line.decode('utf-8'))

Provider Support Matrix

Different providers have varying multimodal capabilities:
ProviderModelsImage URLsBase64 ImagesPDF/DocsMax Image SizeMax File Size
OpenAIgpt-4o, gpt-4-turbo20 MB512 MB
Anthropicclaude-3-opus, claude-3-5-sonnet5 MB10 MB
Googlegemini-1.5-pro, gemini-1.5-flash20 MB2 GB
Mistralpixtral-12b-10 MB-
See Vision Capabilities for detailed provider comparison.

Best Practices

Choosing Input Method

Use HTTP(S) URLs when:
  • Images are publicly accessible
  • You want to minimize request payload size
  • Files are already hosted
Use uploaded files (UUID) when:
  • Processing the same file multiple times
  • Files are large (reduces repeated upload overhead)
  • Better performance is needed
Use base64 when:
  • Files are small (5 MB)
  • URLs aren’t available
  • Security/privacy requires inline data

Optimizing Performance

Image optimization:
  • Resize large images before uploading
  • Use appropriate compression
  • Consider using URLs for public images
Document optimization:
  • Extract relevant pages from large PDFs
  • Use text extraction for text-heavy documents
  • Consider OCR preprocessing for scanned documents

Prompting Strategies

Be specific:
# Vague
"What's in this image?"

# Specific
"List all visible UI components in this screenshot, including buttons, text fields, and their labels."
Provide context:
{
    "type": "text",
    "text": "This is a medical chart showing patient vitals over 24 hours. Identify any concerning trends."
}
Use structured output:
{
    "type": "text",
    "text": "Extract data as JSON with fields: date, vendor, total, items[]."
}

Error Handling

Common Issues

File too large:
{
  "error": {
    "code": "file_too_large",
    "message": "File size exceeds provider limit of 20 MB"
  }
}
Unsupported format:
{
  "error": {
    "code": "unsupported_format",
    "message": "Image format .bmp is not supported"
  }
}
Invalid base64:
{
  "error": {
    "code": "invalid_base64",
    "message": "Invalid base64 data in data URL"
  }
}

Handling Errors

import requests

try:
    response = requests.post(url, headers=headers, json=payload, stream=True)
    response.raise_for_status()

    for line in response.iter_lines():
        if line:
            print(line.decode('utf-8'))

except requests.exceptions.HTTPError as e:
    if e.response.status_code == 413:
        print("File too large. Try compressing or resizing.")
    elif e.response.status_code == 422:
        print("Invalid request:", e.response.json())
    else:
        print(f"HTTP error: {e}")

Next Steps