Skip to main content

LLM Smart Routing Patterns

Learn practical patterns for implementing smart routing with LLMs in production applications using Eden AI’s dynamic model selection.

Overview

This guide provides LLM-specific patterns and examples for smart routing. For comprehensive router documentation, see the Smart Routing section. What you’ll learn:
  • LLM-specific routing patterns
  • Customizing model candidates for LLM use cases
  • Combining smart routing with function calling and streaming
  • Practical code examples for common scenarios
  • Cost optimization strategies for LLM workloads
Related documentation:

Basic Implementation Patterns

Pattern 1: Default Smart Routing

Let the system choose from all available models:
import requests

def chat_with_smart_routing(message: str) -> str:
    """Simple chat with automatic model selection."""
    url = "https://api.edenai.run/v3/llm/chat/completions"
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }

    payload = {
        "model": "@edenai",  # Automatic routing
        "messages": [{"role": "user", "content": message}]
    }

    response = requests.post(url, headers=headers, json=payload)
    data = response.json()

    return data['choices'][0]['message']['content']

# Usage
response = chat_with_smart_routing("Explain machine learning")
print(response)

Pattern 2: Custom Candidate Pool

Define specific models for your use case:
import requests

def chat_with_custom_candidates(
    message: str,
    use_case: str = "general"
) -> str:
    """Chat with use-case-specific model candidates."""

    # Define candidate pools for different use cases
    CANDIDATE_POOLS = {
        "code": [
            "openai/gpt-4o",
            "anthropic/claude-sonnet-4-5",
        ],
        "creative": [
            "anthropic/claude-opus-4-5",
            "openai/gpt-4o",
            "google/gemini-2.5-pro",
        ],
        "fast": [
            "openai/gpt-4o-mini",
            "google/gemini-2.0-flash",
            "openai/gpt-4",
        ],
        "general": [
            "openai/gpt-4o",
            "anthropic/claude-sonnet-4-5",
            "google/gemini-2.0-flash",
        ]
    }

    url = "https://api.edenai.run/v3/llm/chat/completions"
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }

    payload = {
        "model": "@edenai",
        "router_candidates": CANDIDATE_POOLS.get(use_case, CANDIDATE_POOLS["general"]),
        "messages": [{"role": "user", "content": message}]
    }

    response = requests.post(url, headers=headers, json=payload)
    data = response.json()

    return data['choices'][0]['message']['content']

# Usage examples
code_response = chat_with_custom_candidates(
    "Write a Python function to merge two sorted lists",
    use_case="code"
)

creative_response = chat_with_custom_candidates(
    "Write a short story about a robot",
    use_case="creative"
)

fast_response = chat_with_custom_candidates(
    "What's the capital of France?",
    use_case="fast"
)

Pattern 3: OpenAI SDK Integration

Use smart routing with the official OpenAI SDK:
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.edenai.run/v3/llm"
)

def chat_with_openai_sdk(message: str, candidates: list[str] = None):
    """Use smart routing with OpenAI SDK."""

    extra_params = {}
    if candidates:
        extra_params["router_candidates"] = candidates

    response = client.chat.completions.create(
        model="@edenai",
        messages=[
            {"role": "user", "content": message}
        ],
        extra_body=extra_params  # Pass router_candidates here
    )

    selected_model = response.model
    print(f"Router selected: {selected_model}")

    full_response = response.choices[0].message.content
    print(full_response)

    return full_response, selected_model

# Usage
response, model = chat_with_openai_sdk(
    "Explain neural networks",
    candidates=["openai/gpt-4o", "anthropic/claude-sonnet-4-5"]
)
print(f"\nModel used: {model}")

Advanced Patterns

Pattern 4: Smart Routing with Function Calling

Combine smart routing with function/tool calling:
import requests
import json

def chat_with_tools(message: str, tools: list):
    """Smart routing with function calling."""

    url = "https://api.edenai.run/v3/llm/chat/completions"
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }

    payload = {
        "model": "@edenai",
        # Choose models good at function calling
        "router_candidates": [
            "openai/gpt-4o",
            "anthropic/claude-sonnet-4-5",
            "google/gemini-2.0-flash"
        ],
        "messages": [{"role": "user", "content": message}],
        "tools": tools  # Router considers tool compatibility
    }

    response = requests.post(url, headers=headers, json=payload)
    data = response.json()

    message_data = data.get('choices', [{}])[0].get('message', {})
    tool_calls = message_data.get('tool_calls', [])
    full_response = message_data.get('content', '')

    return {
        "response": full_response,
        "tool_calls": tool_calls
    }

# Define tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

# Usage
result = chat_with_tools(
    "What's the weather like in Paris?",
    tools=tools
)
print(f"Response: {result['response']}")
print(f"Tool calls: {result['tool_calls']}")

Pattern 5: Cost-Optimized Routing with Budget Constraints

Optimize costs by limiting to budget-friendly models:
import requests

class CostOptimizedRouter:
    """Smart routing with cost optimization."""

    # Model tiers by cost
    BUDGET_MODELS = [
        "openai/gpt-4o-mini",
        "google/gemini-2.0-flash",
        "openai/gpt-4",
    ]

    BALANCED_MODELS = [
        "openai/gpt-4o",
        "anthropic/claude-sonnet-4-5",
        "google/gemini-2.0-flash",
    ]

    PREMIUM_MODELS = [
        "anthropic/claude-opus-4-5",
        "openai/gpt-4o",
        "google/gemini-2.5-pro",
    ]

    def __init__(self, api_key: str, cost_tier: str = "balanced"):
        self.api_key = api_key
        self.cost_tier = cost_tier

    def get_candidates(self) -> list[str]:
        """Get candidates based on cost tier."""
        if self.cost_tier == "budget":
            return self.BUDGET_MODELS
        elif self.cost_tier == "premium":
            return self.PREMIUM_MODELS
        else:
            return self.BALANCED_MODELS

    def chat(self, message: str) -> tuple[str, float]:
        """Chat and return response with estimated cost."""

        url = "https://api.edenai.run/v3/llm/chat/completions"
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json"
        }

        payload = {
            "model": "@edenai",
            "router_candidates": self.get_candidates(),
            "messages": [{"role": "user", "content": message}]
        }

        response = requests.post(url, headers=headers, json=payload)
        data = response.json()

        selected_model = data.get('model')
        print(f"Router selected: {selected_model} ({self.cost_tier} tier)")

        full_response = data.get('choices', [{}])[0].get('message', {}).get('content', '')

        # You could track actual cost from response metadata
        estimated_cost = 0.001  # Placeholder

        return full_response, estimated_cost

# Usage examples
budget_router = CostOptimizedRouter(API_KEY, cost_tier="budget")
response, cost = budget_router.chat("Summarize this article")
print(f"Cost: ${cost:.4f}")

premium_router = CostOptimizedRouter(API_KEY, cost_tier="premium")
response, cost = premium_router.chat("Write a comprehensive analysis")
print(f"Cost: ${cost:.4f}")

Pattern 6: Multi-Turn Conversations with Context

Maintain conversation context with smart routing:
import requests
import json

class SmartRoutingChatSession:
    """Maintain conversation with smart routing."""

    def __init__(self, api_key: str, candidates: list[str] = None):
        self.api_key = api_key
        self.candidates = candidates
        self.messages = []
        self.selected_models = []  # Track model selection per turn

    def send_message(self, content: str) -> str:
        """Send a message and get response."""

        # Add user message to history
        self.messages.append({"role": "user", "content": content})

        url = "https://api.edenai.run/v3/llm/chat/completions"
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json"
        }

        payload = {
            "model": "@edenai",
            "messages": self.messages  # Include full conversation
        }

        if self.candidates:
            payload["router_candidates"] = self.candidates

        response = requests.post(url, headers=headers, json=payload)
        data = response.json()

        selected_model = data.get('model')
        assistant_response = data.get('choices', [{}])[0].get('message', {}).get('content', '')

        # Add assistant response to history
        self.messages.append({"role": "assistant", "content": assistant_response})
        self.selected_models.append(selected_model)

        return assistant_response

    def get_conversation_summary(self) -> dict:
        """Get conversation statistics."""
        return {
            "turns": len(self.messages) // 2,
            "models_used": self.selected_models,
            "total_messages": len(self.messages)
        }

# Usage
session = SmartRoutingChatSession(
    API_KEY,
    candidates=["openai/gpt-4o", "anthropic/claude-sonnet-4-5"]
)

# Multi-turn conversation
response1 = session.send_message("What is Python?")
print(f"Assistant: {response1}\n")

response2 = session.send_message("Can you give me a code example?")
print(f"Assistant: {response2}\n")

response3 = session.send_message("Explain that example in detail")
print(f"Assistant: {response3}\n")

# Summary
summary = session.get_conversation_summary()
print(f"\nConversation summary: {summary}")

Monitoring and Debugging

Tracking Routing Decisions

Monitor which models are selected:
import requests
import json
from collections import defaultdict
from datetime import datetime

class RoutingMonitor:
    """Track and analyze routing decisions."""

    def __init__(self, api_key: str):
        self.api_key = api_key
        self.routing_history = []

    def chat_with_tracking(
        self,
        message: str,
        candidates: list[str] = None
    ) -> dict:
        """Chat and track routing decision."""

        start_time = datetime.now()

        url = "https://api.edenai.run/v3/llm/chat/completions"
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json"
        }

        payload = {
            "model": "@edenai",
            "messages": [{"role": "user", "content": message}]
        }

        if candidates:
            payload["router_candidates"] = candidates

        response = requests.post(url, headers=headers, json=payload)
        first_chunk_time = datetime.now()
        data = response.json()

        selected_model = data.get('model')
        full_response = data.get('choices', [{}])[0].get('message', {}).get('content', '')

        end_time = datetime.now()

        # Record routing decision
        routing_info = {
            "timestamp": start_time.isoformat(),
            "message": message[:50],  # Truncate for logging
            "selected_model": selected_model,
            "candidates": candidates or "default",
            "routing_latency_ms": (first_chunk_time - start_time).total_seconds() * 1000 if first_chunk_time else None,
            "total_latency_ms": (end_time - start_time).total_seconds() * 1000,
            "response_length": len(full_response)
        }

        self.routing_history.append(routing_info)

        return {
            "response": full_response,
            "routing_info": routing_info
        }

    def get_statistics(self) -> dict:
        """Get routing statistics."""
        if not self.routing_history:
            return {"error": "No routing history"}

        model_counts = defaultdict(int)
        total_routing_latency = 0
        total_latency = 0

        for entry in self.routing_history:
            model_counts[entry["selected_model"]] += 1
            if entry["routing_latency_ms"]:
                total_routing_latency += entry["routing_latency_ms"]
            total_latency += entry["total_latency_ms"]

        return {
            "total_requests": len(self.routing_history),
            "model_distribution": dict(model_counts),
            "avg_routing_latency_ms": total_routing_latency / len(self.routing_history),
            "avg_total_latency_ms": total_latency / len(self.routing_history),
            "most_selected_model": max(model_counts, key=model_counts.get)
        }

# Usage
monitor = RoutingMonitor(API_KEY)

# Make several requests
for query in [
    "What is machine learning?",
    "Write a Python function to sort",
    "Explain quantum physics",
    "Tell me a joke"
]:
    result = monitor.chat_with_tracking(
        query,
        candidates=["openai/gpt-4o", "anthropic/claude-sonnet-4-5", "google/gemini-2.0-flash"]
    )
    print(f"Q: {query}")
    print(f"Model: {result['routing_info']['selected_model']}")
    print(f"Routing latency: {result['routing_info']['routing_latency_ms']:.0f}ms\n")

# Get statistics
stats = monitor.get_statistics()
print("\n=== Routing Statistics ===")
print(f"Total requests: {stats['total_requests']}")
print(f"Model distribution: {stats['model_distribution']}")
print(f"Average routing latency: {stats['avg_routing_latency_ms']:.0f}ms")
print(f"Most selected: {stats['most_selected_model']}")

Error Handling

Robust Error Handling with Fallbacks

import requests
import json
from typing import Optional

def chat_with_fallback(
    message: str,
    primary_candidates: list[str],
    fallback_model: str = "openai/gpt-4o"
) -> dict:
    """Chat with smart routing and fallback to fixed model on error."""

    url = "https://api.edenai.run/v3/llm/chat/completions"
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }

    # Try smart routing first
    try:
        payload = {
            "model": "@edenai",
            "router_candidates": primary_candidates,
            "messages": [{"role": "user", "content": message}]
        }

        response = requests.post(url, headers=headers, json=payload, timeout=30)
        response.raise_for_status()

        data = response.json()
        full_response = data.get('choices', [{}])[0].get('message', {}).get('content', '')

        return {
            "response": full_response,
            "method": "smart_routing",
            "success": True
        }

    except Exception as e:
        print(f"Smart routing failed: {e}")
        print(f"Falling back to {fallback_model}")

        # Fallback to fixed model
        try:
            payload = {
                "model": fallback_model,
                "messages": [{"role": "user", "content": message}]
            }

            response = requests.post(url, headers=headers, json=payload, timeout=30)
            response.raise_for_status()

            data = response.json()
            full_response = data.get('choices', [{}])[0].get('message', {}).get('content', '')

            return {
                "response": full_response,
                "method": "fallback",
                "fallback_model": fallback_model,
                "success": True,
                "original_error": str(e)
            }

        except Exception as fallback_error:
            return {
                "response": None,
                "method": "failed",
                "success": False,
                "error": str(fallback_error)
            }

# Usage
result = chat_with_fallback(
    "Explain neural networks",
    primary_candidates=["openai/gpt-4o", "anthropic/claude-sonnet-4-5"]
)

if result["success"]:
    print(f"Response (via {result['method']}): {result['response']}")
else:
    print(f"Failed: {result['error']}")

Best Practices

1. Choose Appropriate Candidates

Do:
  • Limit to 3-5 models per use case
  • Choose models with similar capabilities
  • Include at least one fast/cheap model for cost efficiency
  • Test candidate pools with your specific workload
Don’t:
  • Include 20+ candidates (slows routing decision)
  • Mix specialized models (e.g., code + creative)
  • Use models you haven’t tested

2. Monitor Performance

Do:
  • Track routing latency in production
  • Monitor model distribution
  • Alert on routing failures
  • A/B test smart routing vs. fixed models
Don’t:
  • Deploy without monitoring
  • Ignore routing patterns
  • Assume routing is always optimal

3. Cost Optimization

Do:
  • Define cost tiers (budget/balanced/premium)
  • Route simple queries to cheaper models
  • Track actual spend per use case
  • Review routing decisions regularly
Don’t:
  • Use premium-only candidates for simple tasks
  • Ignore cost metrics
  • Assume routing always chooses cheapest

4. Error Handling

Do:
  • Implement fallback to fixed models
  • Set appropriate timeouts
  • Log routing failures
  • Handle network errors gracefully
Don’t:
  • Rely solely on smart routing without fallback
  • Use infinite timeouts
  • Ignore routing errors

Performance Considerations

Latency

  • Routing overhead: 100-500ms
  • First token: Includes routing time
  • Subsequent tokens: No overhead
When to avoid:
  • Real-time chat with <500ms requirements
  • High-frequency API calls (>100/sec)
  • Strict SLA requirements

Caching

  • Routing decisions: Not cached (context-dependent)
  • Model list: Cached (1 hour TTL)
  • API responses: Not cached by router

Common Patterns Summary

Use CaseRecommended CandidatesNotes
General chatgpt-4o, claude-sonnet-4-5, gemini-2.0-flashBalanced quality/cost
Code generationgpt-4o, claude-sonnet-4-5Strong coding models
Creative writingclaude-opus-4-5, gpt-4o, gemini-2.5-proPremium models
Simple Q&Agpt-4o-mini, gemini-2.0-flash, claude-haiku-4-5Fast and cheap
Function callinggpt-4o, claude-sonnet-4-5, gemini-2.0-flashTool-compatible

Next Steps

Troubleshooting

Issue: Routing always selects the same model

Possible causes:
  • Candidates list too restrictive
  • Request pattern favors one model
  • Other models unavailable
Solutions:
  • Expand candidate pool
  • Check model availability
  • Review request characteristics

Issue: High routing latency (>1s)

Possible causes:
  • Network issues
  • Large candidate pool
  • Router API congestion
Solutions:
  • Reduce candidates to 3-5 models
  • Check network connectivity
  • Consider fixed models for latency-critical apps

Issue: Unexpected costs

Possible causes:
  • Router selecting premium models
  • High volume of requests
  • Long responses
Solutions:
  • Use budget-tier candidates
  • Limit max_tokens
  • Monitor model distribution
  • Implement cost alerts