LLM Smart Routing Patterns

Learn practical patterns for implementing smart routing with LLMs in production applications using Eden AI’s dynamic model selection.

Overview

This guide provides LLM-specific patterns and examples for smart routing. For comprehensive router documentation, see the Smart Routing section. What you’ll learn:

LLM-specific routing patterns
Customizing model candidates for LLM use cases
Combining smart routing with function calling and streaming
Practical code examples for common scenarios
Cost optimization strategies for LLM workloads

Related documentation:

Router Getting Started - Core routing concepts and basics
Router Advanced Usage - Advanced patterns and optimization

Basic Implementation Patterns

Pattern 1: Default Smart Routing

Let the system choose from all available models:

import requests

def chat_with_smart_routing(message: str) -> str:
    """Simple chat with automatic model selection."""
    url = "https://api.edenai.run/v3/llm/chat/completions"
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }

    payload = {
        "model": "@edenai",  # Automatic routing
        "messages": [{"role": "user", "content": message}]
    }

    response = requests.post(url, headers=headers, json=payload)
    data = response.json()

    return data['choices'][0]['message']['content']

# Usage
response = chat_with_smart_routing("Explain machine learning")
print(response)

Pattern 2: Custom Candidate Pool

Define specific models for your use case:

import requests

def chat_with_custom_candidates(
    message: str,
    use_case: str = "general"
) -> str:
    """Chat with use-case-specific model candidates."""

    # Define candidate pools for different use cases
    CANDIDATE_POOLS = {
        "code": [
            "openai/gpt-4o",
            "anthropic/claude-sonnet-4-5",
        ],
        "creative": [
            "anthropic/claude-opus-4-5",
            "openai/gpt-4o",
            "google/gemini-2.5-pro",
        ],
        "fast": [
            "openai/gpt-4o-mini",
            "google/gemini-2.5-flash",
            "openai/gpt-4",
        ],
        "general": [
            "openai/gpt-4o",
            "anthropic/claude-sonnet-4-5",
            "google/gemini-2.5-flash",
        ]
    }

    url = "https://api.edenai.run/v3/llm/chat/completions"
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }

    payload = {
        "model": "@edenai",
        "router_candidates": CANDIDATE_POOLS.get(use_case, CANDIDATE_POOLS["general"]),
        "messages": [{"role": "user", "content": message}]
    }

    response = requests.post(url, headers=headers, json=payload)
    data = response.json()

    return data['choices'][0]['message']['content']

# Usage examples
code_response = chat_with_custom_candidates(
    "Write a Python function to merge two sorted lists",
    use_case="code"
)

creative_response = chat_with_custom_candidates(
    "Write a short story about a robot",
    use_case="creative"
)

fast_response = chat_with_custom_candidates(
    "What's the capital of France?",
    use_case="fast"
)

Pattern 3: OpenAI SDK Integration

Use smart routing with the official OpenAI SDK:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.edenai.run/v3/llm"
)

def chat_with_openai_sdk(message: str, candidates: list[str] = None):
    """Use smart routing with OpenAI SDK."""

    extra_params = {}
    if candidates:
        extra_params["router_candidates"] = candidates

    response = client.chat.completions.create(
        model="@edenai",
        messages=[
            {"role": "user", "content": message}
        ],
        extra_body=extra_params  # Pass router_candidates here
    )

    selected_model = response.model
    print(f"Router selected: {selected_model}")

    full_response = response.choices[0].message.content
    print(full_response)

    return full_response, selected_model

# Usage
response, model = chat_with_openai_sdk(
    "Explain neural networks",
    candidates=["openai/gpt-4o", "anthropic/claude-sonnet-4-5"]
)
print(f"\nModel used: {model}")

Advanced Patterns

Pattern 4: Smart Routing with Function Calling

Combine smart routing with function/tool calling:

import requests
import json

def chat_with_tools(message: str, tools: list):
    """Smart routing with function calling."""

    url = "https://api.edenai.run/v3/llm/chat/completions"
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }

    payload = {
        "model": "@edenai",
        # Choose models good at function calling
        "router_candidates": [
            "openai/gpt-4o",
            "anthropic/claude-sonnet-4-5",
            "google/gemini-2.5-flash"
        ],
        "messages": [{"role": "user", "content": message}],
        "tools": tools  # Router considers tool compatibility
    }

    response = requests.post(url, headers=headers, json=payload)
    data = response.json()

    message_data = data.get('choices', [{}])[0].get('message', {})
    tool_calls = message_data.get('tool_calls', [])
    full_response = message_data.get('content', '')

    return {
        "response": full_response,
        "tool_calls": tool_calls
    }

# Define tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

# Usage
result = chat_with_tools(
    "What's the weather like in Paris?",
    tools=tools
)
print(f"Response: {result['response']}")
print(f"Tool calls: {result['tool_calls']}")

Pattern 5: Cost-Optimized Routing with Budget Constraints

Optimize costs by limiting to budget-friendly models:

import requests

class CostOptimizedRouter:
    """Smart routing with cost optimization."""

    # Model tiers by cost
    BUDGET_MODELS = [
        "openai/gpt-4o-mini",
        "google/gemini-2.5-flash",
        "openai/gpt-4",
    ]

    BALANCED_MODELS = [
        "openai/gpt-4o",
        "anthropic/claude-sonnet-4-5",
        "google/gemini-2.5-flash",
    ]

    PREMIUM_MODELS = [
        "anthropic/claude-opus-4-5",
        "openai/gpt-4o",
        "google/gemini-2.5-pro",
    ]

    def __init__(self, api_key: str, cost_tier: str = "balanced"):
        self.api_key = api_key
        self.cost_tier = cost_tier

    def get_candidates(self) -> list[str]:
        """Get candidates based on cost tier."""
        if self.cost_tier == "budget":
            return self.BUDGET_MODELS
        elif self.cost_tier == "premium":
            return self.PREMIUM_MODELS
        else:
            return self.BALANCED_MODELS

    def chat(self, message: str) -> tuple[str, float]:
        """Chat and return response with estimated cost."""

        url = "https://api.edenai.run/v3/llm/chat/completions"
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json"
        }

        payload = {
            "model": "@edenai",
            "router_candidates": self.get_candidates(),
            "messages": [{"role": "user", "content": message}]
        }

        response = requests.post(url, headers=headers, json=payload)
        data = response.json()

        selected_model = data.get('model')
        print(f"Router selected: {selected_model} ({self.cost_tier} tier)")

        full_response = data.get('choices', [{}])[0].get('message', {}).get('content', '')

        # You could track actual cost from response metadata
        estimated_cost = 0.001  # Placeholder

        return full_response, estimated_cost

# Usage examples
budget_router = CostOptimizedRouter(API_KEY, cost_tier="budget")
response, cost = budget_router.chat("Summarize this article")
print(f"Cost: ${cost:.4f}")

premium_router = CostOptimizedRouter(API_KEY, cost_tier="premium")
response, cost = premium_router.chat("Write a comprehensive analysis")
print(f"Cost: ${cost:.4f}")

Pattern 6: Multi-Turn Conversations with Context

Maintain conversation context with smart routing:

import requests
import json

class SmartRoutingChatSession:
    """Maintain conversation with smart routing."""

    def __init__(self, api_key: str, candidates: list[str] = None):
        self.api_key = api_key
        self.candidates = candidates
        self.messages = []
        self.selected_models = []  # Track model selection per turn

    def send_message(self, content: str) -> str:
        """Send a message and get response."""

        # Add user message to history
        self.messages.append({"role": "user", "content": content})

        url = "https://api.edenai.run/v3/llm/chat/completions"
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json"
        }

        payload = {
            "model": "@edenai",
            "messages": self.messages  # Include full conversation
        }

        if self.candidates:
            payload["router_candidates"] = self.candidates

        response = requests.post(url, headers=headers, json=payload)
        data = response.json()

        selected_model = data.get('model')
        assistant_response = data.get('choices', [{}])[0].get('message', {}).get('content', '')

        # Add assistant response to history
        self.messages.append({"role": "assistant", "content": assistant_response})
        self.selected_models.append(selected_model)

        return assistant_response

    def get_conversation_summary(self) -> dict:
        """Get conversation statistics."""
        return {
            "turns": len(self.messages) // 2,
            "models_used": self.selected_models,
            "total_messages": len(self.messages)
        }

# Usage
session = SmartRoutingChatSession(
    API_KEY,
    candidates=["openai/gpt-4o", "anthropic/claude-sonnet-4-5"]
)

# Multi-turn conversation
response1 = session.send_message("What is Python?")
print(f"Assistant: {response1}\n")

response2 = session.send_message("Can you give me a code example?")
print(f"Assistant: {response2}\n")

response3 = session.send_message("Explain that example in detail")
print(f"Assistant: {response3}\n")

# Summary
summary = session.get_conversation_summary()
print(f"\nConversation summary: {summary}")

Monitoring and Debugging

Tracking Routing Decisions

Monitor which models are selected:

import requests
import json
from collections import defaultdict
from datetime import datetime

class RoutingMonitor:
    """Track and analyze routing decisions."""

    def __init__(self, api_key: str):
        self.api_key = api_key
        self.routing_history = []

    def chat_with_tracking(
        self,
        message: str,
        candidates: list[str] = None
    ) -> dict:
        """Chat and track routing decision."""

        start_time = datetime.now()

        url = "https://api.edenai.run/v3/llm/chat/completions"
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json"
        }

        payload = {
            "model": "@edenai",
            "messages": [{"role": "user", "content": message}]
        }

        if candidates:
            payload["router_candidates"] = candidates

        response = requests.post(url, headers=headers, json=payload)
        first_chunk_time = datetime.now()
        data = response.json()

        selected_model = data.get('model')
        full_response = data.get('choices', [{}])[0].get('message', {}).get('content', '')

        end_time = datetime.now()

        # Record routing decision
        routing_info = {
            "timestamp": start_time.isoformat(),
            "message": message[:50],  # Truncate for logging
            "selected_model": selected_model,
            "candidates": candidates or "default",
            "routing_latency_ms": (first_chunk_time - start_time).total_seconds() * 1000 if first_chunk_time else None,
            "total_latency_ms": (end_time - start_time).total_seconds() * 1000,
            "response_length": len(full_response)
        }

        self.routing_history.append(routing_info)

        return {
            "response": full_response,
            "routing_info": routing_info
        }

    def get_statistics(self) -> dict:
        """Get routing statistics."""
        if not self.routing_history:
            return {"error": "No routing history"}

        model_counts = defaultdict(int)
        total_routing_latency = 0
        total_latency = 0

        for entry in self.routing_history:
            model_counts[entry["selected_model"]] += 1
            if entry["routing_latency_ms"]:
                total_routing_latency += entry["routing_latency_ms"]
            total_latency += entry["total_latency_ms"]

        return {
            "total_requests": len(self.routing_history),
            "model_distribution": dict(model_counts),
            "avg_routing_latency_ms": total_routing_latency / len(self.routing_history),
            "avg_total_latency_ms": total_latency / len(self.routing_history),
            "most_selected_model": max(model_counts, key=model_counts.get)
        }

# Usage
monitor = RoutingMonitor(API_KEY)

# Make several requests
for query in [
    "What is machine learning?",
    "Write a Python function to sort",
    "Explain quantum physics",
    "Tell me a joke"
]:
    result = monitor.chat_with_tracking(
        query,
        candidates=["openai/gpt-4o", "anthropic/claude-sonnet-4-5", "google/gemini-2.5-flash"]
    )
    print(f"Q: {query}")
    print(f"Model: {result['routing_info']['selected_model']}")
    print(f"Routing latency: {result['routing_info']['routing_latency_ms']:.0f}ms\n")

# Get statistics
stats = monitor.get_statistics()
print("\n=== Routing Statistics ===")
print(f"Total requests: {stats['total_requests']}")
print(f"Model distribution: {stats['model_distribution']}")
print(f"Average routing latency: {stats['avg_routing_latency_ms']:.0f}ms")
print(f"Most selected: {stats['most_selected_model']}")

Error Handling

Robust Error Handling with Fallbacks

import requests
import json
from typing import Optional

def chat_with_fallback(
    message: str,
    primary_candidates: list[str],
    fallback_model: str = "openai/gpt-4o"
) -> dict:
    """Chat with smart routing and fallback to fixed model on error."""

    url = "https://api.edenai.run/v3/llm/chat/completions"
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }

    # Try smart routing first
    try:
        payload = {
            "model": "@edenai",
            "router_candidates": primary_candidates,
            "messages": [{"role": "user", "content": message}]
        }

        response = requests.post(url, headers=headers, json=payload, timeout=30)
        response.raise_for_status()

        data = response.json()
        full_response = data.get('choices', [{}])[0].get('message', {}).get('content', '')

        return {
            "response": full_response,
            "method": "smart_routing",
            "success": True
        }

    except Exception as e:
        print(f"Smart routing failed: {e}")
        print(f"Falling back to {fallback_model}")

        # Fallback to fixed model
        try:
            payload = {
                "model": fallback_model,
                "messages": [{"role": "user", "content": message}]
            }

            response = requests.post(url, headers=headers, json=payload, timeout=30)
            response.raise_for_status()

            data = response.json()
            full_response = data.get('choices', [{}])[0].get('message', {}).get('content', '')

            return {
                "response": full_response,
                "method": "fallback",
                "fallback_model": fallback_model,
                "success": True,
                "original_error": str(e)
            }

        except Exception as fallback_error:
            return {
                "response": None,
                "method": "failed",
                "success": False,
                "error": str(fallback_error)
            }

# Usage
result = chat_with_fallback(
    "Explain neural networks",
    primary_candidates=["openai/gpt-4o", "anthropic/claude-sonnet-4-5"]
)

if result["success"]:
    print(f"Response (via {result['method']}): {result['response']}")
else:
    print(f"Failed: {result['error']}")

Best Practices

1. Choose Appropriate Candidates

✅ Do:

Limit to 3-5 models per use case
Choose models with similar capabilities
Include at least one fast/cheap model for cost efficiency
Test candidate pools with your specific workload

❌ Don’t:

Include 20+ candidates (slows routing decision)
Mix specialized models (e.g., code + creative)
Use models you haven’t tested

2. Monitor Performance

✅ Do:

Track routing latency in production
Monitor model distribution
Alert on routing failures
A/B test smart routing vs. fixed models

❌ Don’t:

Deploy without monitoring
Ignore routing patterns
Assume routing is always optimal

3. Cost Optimization

✅ Do:

Define cost tiers (budget/balanced/premium)
Route simple queries to cheaper models
Track actual spend per use case
Review routing decisions regularly

❌ Don’t:

Use premium-only candidates for simple tasks
Ignore cost metrics
Assume routing always chooses cheapest

4. Error Handling

✅ Do:

Implement fallback to fixed models
Set appropriate timeouts
Log routing failures
Handle network errors gracefully

❌ Don’t:

Rely solely on smart routing without fallback
Use infinite timeouts
Ignore routing errors

Performance Considerations

Latency

Routing overhead: 100-500ms
First token: Includes routing time
Subsequent tokens: No overhead

When to avoid:

Real-time chat with <500ms requirements
High-frequency API calls (>100/sec)
Strict SLA requirements

Caching

Routing decisions: Not cached (context-dependent)
Model list: Cached (1 hour TTL)
API responses: Not cached by router

Common Patterns Summary

Use Case	Recommended Candidates	Notes
General chat	gpt-4o, claude-sonnet-4-5, gemini-2.5-flash	Balanced quality/cost
Code generation	gpt-4o, claude-sonnet-4-5	Strong coding models
Creative writing	claude-opus-4-5, gpt-4o, gemini-2.5-pro	Premium models
Simple Q&A	gpt-4o-mini, gemini-2.5-flash, claude-haiku-4-5	Fast and cheap
Function calling	gpt-4o, claude-sonnet-4-5, gemini-2.5-flash	Tool-compatible

Next Steps

Router Getting Started - Learn core routing concepts
Router Advanced Usage - Master advanced routing patterns
Optimize LLM Costs Tutorial - Complete cost optimization workflow
Chat Completions Guide - Master the LLM endpoint
Streaming Guide - Handle SSE responses

Troubleshooting

Issue: Routing always selects the same model

Possible causes:

Candidates list too restrictive
Request pattern favors one model
Other models unavailable

Solutions:

Expand candidate pool
Check model availability
Review request characteristics

Issue: High routing latency (>1s)

Possible causes:

Network issues
Large candidate pool
Router API congestion

Solutions:

Reduce candidates to 3-5 models
Check network connectivity
Consider fixed models for latency-critical apps

Issue: Unexpected costs

Possible causes:

Router selecting premium models
High volume of requests
Long responses

Solutions:

Use budget-tier candidates
Limit max_tokens
Monitor model distribution
Implement cost alerts

V3 Documentation

​LLM Smart Routing Patterns

​Overview

​Basic Implementation Patterns

​Pattern 1: Default Smart Routing

​Pattern 2: Custom Candidate Pool

​Pattern 3: OpenAI SDK Integration

​Advanced Patterns

​Pattern 4: Smart Routing with Function Calling

​Pattern 5: Cost-Optimized Routing with Budget Constraints

​Pattern 6: Multi-Turn Conversations with Context

​Monitoring and Debugging

​Tracking Routing Decisions

​Error Handling

​Robust Error Handling with Fallbacks

​Best Practices

​1. Choose Appropriate Candidates

​2. Monitor Performance

​3. Cost Optimization

​4. Error Handling

​Performance Considerations

​Latency

​Caching

​Common Patterns Summary

​Next Steps

​Troubleshooting

​Issue: Routing always selects the same model

​Issue: High routing latency (>1s)

​Issue: Unexpected costs

LLM Smart Routing Patterns

Overview

Basic Implementation Patterns

Pattern 1: Default Smart Routing

Pattern 2: Custom Candidate Pool

Pattern 3: OpenAI SDK Integration

Advanced Patterns

Pattern 4: Smart Routing with Function Calling

Pattern 5: Cost-Optimized Routing with Budget Constraints

Pattern 6: Multi-Turn Conversations with Context

Monitoring and Debugging

Tracking Routing Decisions

Error Handling

Robust Error Handling with Fallbacks

Best Practices

1. Choose Appropriate Candidates

2. Monitor Performance

3. Cost Optimization

4. Error Handling

Performance Considerations

Latency

Caching

Common Patterns Summary

Next Steps

Troubleshooting

Issue: Routing always selects the same model

Issue: High routing latency (>1s)

Issue: Unexpected costs