Skip to main content

Working with Media Files

Send images, documents, and other media files to LLMs for analysis and understanding.

Overview

Eden AI V3 LLM endpoints support multimodal inputs, allowing you to send:
  • Images - For visual understanding and analysis
  • Documents - PDFs and text files for processing
  • Mixed content - Combine text prompts with media
Multimodal capabilities enable use cases like:
  • Analyzing screenshots and diagrams
  • Extracting data from images and documents
  • Visual question answering
  • Chart and graph interpretation
  • Receipt and invoice processing

Supported Input Types

V3 supports multiple ways to send media to LLMs:
Input TypeFormatBest ForExample
HTTP(S) URLDirect linkPublicly accessible fileshttps://example.com/image.jpg
Base64 Data URLInline encoded dataSmall files, secure datadata:image/jpeg;base64,...
File UploadUUID from /v3/uploadReusable files, large files550e8400-e29b-...
Base64 File DataRaw base64 or data URLPDFs, documentsdata:application/pdf;base64,...

Image Inputs

Using Image URLs

The simplest method for publicly accessible images:

Using Base64 Image Data

For inline images or when URLs aren’t available:

Using Uploaded Files

For reusable images or better performance:

Document Inputs

PDF and Document Files

Send PDFs and documents for analysis:

Base64 Document Data

For inline document processing:

Mixed Content Messages

Multiple Images

Send multiple images in a single message:

Text + Images + Documents

Combine different media types:

Practical Examples

Analyze a Screenshot

Extract Receipt Data

Summarize PDF Document

Chart Analysis

Provider Support Matrix

Different providers have varying multimodal capabilities:
ProviderModelsImage URLsBase64 ImagesPDF/DocsMax Image SizeMax File Size
OpenAIgpt-4o, gpt-4-turbo20 MB512 MB
Anthropicclaude-3-opus, claude-3-5-sonnet5 MB10 MB
Googlegemini-1.5-pro, gemini-1.5-flash20 MB2 GB
Mistralpixtral-12b-10 MB-
See Vision Capabilities for detailed provider comparison.

Best Practices

Choosing Input Method

Use HTTP(S) URLs when:
  • Images are publicly accessible
  • You want to minimize request payload size
  • Files are already hosted
Use uploaded files (UUID) when:
  • Processing the same file multiple times
  • Files are large (reduces repeated upload overhead)
  • Better performance is needed
Use base64 when:
  • Files are small (5 MB)
  • URLs aren’t available
  • Security/privacy requires inline data

Optimizing Performance

Image optimization:
  • Resize large images before uploading
  • Use appropriate compression
  • Consider using URLs for public images
Document optimization:
  • Extract relevant pages from large PDFs
  • Use text extraction for text-heavy documents
  • Consider OCR preprocessing for scanned documents

Prompting Strategies

Be specific:
# Vague
"What's in this image?"

# Specific
"List all visible UI components in this screenshot, including buttons, text fields, and their labels."
Provide context:
{
    "type": "text",
    "text": "This is a medical chart showing patient vitals over 24 hours. Identify any concerning trends."
}
Use structured output:
{
    "type": "text",
    "text": "Extract data as JSON with fields: date, vendor, total, items[]."
}

Error Handling

Common Issues

File too large:
{
  "error": {
    "code": "file_too_large",
    "message": "File size exceeds provider limit of 20 MB"
  }
}
Unsupported format:
{
  "error": {
    "code": "unsupported_format",
    "message": "Image format .bmp is not supported"
  }
}
Invalid base64:
{
  "error": {
    "code": "invalid_base64",
    "message": "Invalid base64 data in data URL"
  }
}

Handling Errors

Next Steps