Skip to main content

LLM Smart Routing Patterns

Learn practical patterns for implementing smart routing with LLMs in production applications using Eden AI’s dynamic model selection.

Overview

This guide provides LLM-specific patterns and examples for smart routing. For comprehensive router documentation, see the Smart Routing section. What you’ll learn:
  • LLM-specific routing patterns
  • Customizing model candidates for LLM use cases
  • Combining smart routing with function calling and streaming
  • Practical code examples for common scenarios
  • Cost optimization strategies for LLM workloads
Related documentation:

Basic Implementation Patterns

Pattern 1: Default Smart Routing

Let the system choose from all available models:

Pattern 2: Custom Candidate Pool

Define specific models for your use case:

Pattern 3: OpenAI SDK Integration

Use smart routing with the official OpenAI SDK:

Advanced Patterns

Pattern 4: Smart Routing with Function Calling

Combine smart routing with function/tool calling:

Pattern 5: Cost-Optimized Routing with Budget Constraints

Optimize costs by limiting to budget-friendly models:

Pattern 6: Multi-Turn Conversations with Context

Maintain conversation context with smart routing:

Monitoring and Debugging

Tracking Routing Decisions

Monitor which models are selected:

Error Handling

Robust Error Handling with Fallbacks

Best Practices

1. Choose Appropriate Candidates

Do:
  • Limit to 3-5 models per use case
  • Choose models with similar capabilities
  • Include at least one fast/cheap model for cost efficiency
  • Test candidate pools with your specific workload
Don’t:
  • Include 20+ candidates (slows routing decision)
  • Mix specialized models (e.g., code + creative)
  • Use models you haven’t tested

2. Monitor Performance

Do:
  • Track routing latency in production
  • Monitor model distribution
  • Alert on routing failures
  • A/B test smart routing vs. fixed models
Don’t:
  • Deploy without monitoring
  • Ignore routing patterns
  • Assume routing is always optimal

3. Cost Optimization

Do:
  • Define cost tiers (budget/balanced/premium)
  • Route simple queries to cheaper models
  • Track actual spend per use case
  • Review routing decisions regularly
Don’t:
  • Use premium-only candidates for simple tasks
  • Ignore cost metrics
  • Assume routing always chooses cheapest

4. Error Handling

Do:
  • Implement fallback to fixed models
  • Set appropriate timeouts
  • Log routing failures
  • Handle network errors gracefully
Don’t:
  • Rely solely on smart routing without fallback
  • Use infinite timeouts
  • Ignore routing errors

Performance Considerations

Latency

  • Routing overhead: 100-500ms
  • First token: Includes routing time
  • Subsequent tokens: No overhead
When to avoid:
  • Real-time chat with <500ms requirements
  • High-frequency API calls (>100/sec)
  • Strict SLA requirements

Caching

  • Routing decisions: Not cached (context-dependent)
  • Model list: Cached (1 hour TTL)
  • API responses: Not cached by router

Common Patterns Summary

Use CaseRecommended CandidatesNotes
General chatgpt-4o, claude-sonnet-4-5, gemini-2.0-flashBalanced quality/cost
Code generationgpt-4o, claude-sonnet-4-5Strong coding models
Creative writingclaude-opus-4-5, gpt-4o, gemini-2.5-proPremium models
Simple Q&Agpt-4o-mini, gemini-2.0-flash, claude-haiku-4-5Fast and cheap
Function callinggpt-4o, claude-sonnet-4-5, gemini-2.0-flashTool-compatible

Next Steps

Troubleshooting

Issue: Routing always selects the same model

Possible causes:
  • Candidates list too restrictive
  • Request pattern favors one model
  • Other models unavailable
Solutions:
  • Expand candidate pool
  • Check model availability
  • Review request characteristics

Issue: High routing latency (>1s)

Possible causes:
  • Network issues
  • Large candidate pool
  • Router API congestion
Solutions:
  • Reduce candidates to 3-5 models
  • Check network connectivity
  • Consider fixed models for latency-critical apps

Issue: Unexpected costs

Possible causes:
  • Router selecting premium models
  • High volume of requests
  • Long responses
Solutions:
  • Use budget-tier candidates
  • Limit max_tokens
  • Monitor model distribution
  • Implement cost alerts