LLM Smart Routing Patterns
Learn practical patterns for implementing smart routing with LLMs in production applications using Eden AI’s dynamic model selection.Overview
This guide provides LLM-specific patterns and examples for smart routing. For comprehensive router documentation, see the Smart Routing section. What you’ll learn:- LLM-specific routing patterns
- Customizing model candidates for LLM use cases
- Combining smart routing with function calling and streaming
- Practical code examples for common scenarios
- Cost optimization strategies for LLM workloads
- Router Getting Started - Core routing concepts and basics
- Router Advanced Usage - Advanced patterns and optimization
- Router Monitoring - Production monitoring and analytics
Basic Implementation Patterns
Pattern 1: Default Smart Routing
Let the system choose from all available models:Pattern 2: Custom Candidate Pool
Define specific models for your use case:Pattern 3: OpenAI SDK Integration
Use smart routing with the official OpenAI SDK:Advanced Patterns
Pattern 4: Smart Routing with Function Calling
Combine smart routing with function/tool calling:Pattern 5: Cost-Optimized Routing with Budget Constraints
Optimize costs by limiting to budget-friendly models:Pattern 6: Multi-Turn Conversations with Context
Maintain conversation context with smart routing:Monitoring and Debugging
Tracking Routing Decisions
Monitor which models are selected:Error Handling
Robust Error Handling with Fallbacks
Best Practices
1. Choose Appropriate Candidates
✅ Do:- Limit to 3-5 models per use case
- Choose models with similar capabilities
- Include at least one fast/cheap model for cost efficiency
- Test candidate pools with your specific workload
- Include 20+ candidates (slows routing decision)
- Mix specialized models (e.g., code + creative)
- Use models you haven’t tested
2. Monitor Performance
✅ Do:- Track routing latency in production
- Monitor model distribution
- Alert on routing failures
- A/B test smart routing vs. fixed models
- Deploy without monitoring
- Ignore routing patterns
- Assume routing is always optimal
3. Cost Optimization
✅ Do:- Define cost tiers (budget/balanced/premium)
- Route simple queries to cheaper models
- Track actual spend per use case
- Review routing decisions regularly
- Use premium-only candidates for simple tasks
- Ignore cost metrics
- Assume routing always chooses cheapest
4. Error Handling
✅ Do:- Implement fallback to fixed models
- Set appropriate timeouts
- Log routing failures
- Handle network errors gracefully
- Rely solely on smart routing without fallback
- Use infinite timeouts
- Ignore routing errors
Performance Considerations
Latency
- Routing overhead: 100-500ms
- First token: Includes routing time
- Subsequent tokens: No overhead
- Real-time chat with <500ms requirements
- High-frequency API calls (>100/sec)
- Strict SLA requirements
Caching
- Routing decisions: Not cached (context-dependent)
- Model list: Cached (1 hour TTL)
- API responses: Not cached by router
Common Patterns Summary
| Use Case | Recommended Candidates | Notes |
|---|---|---|
| General chat | gpt-4o, claude-sonnet-4-5, gemini-2.0-flash | Balanced quality/cost |
| Code generation | gpt-4o, claude-sonnet-4-5 | Strong coding models |
| Creative writing | claude-opus-4-5, gpt-4o, gemini-2.5-pro | Premium models |
| Simple Q&A | gpt-4o-mini, gemini-2.0-flash, claude-haiku-4-5 | Fast and cheap |
| Function calling | gpt-4o, claude-sonnet-4-5, gemini-2.0-flash | Tool-compatible |
Next Steps
- Router Getting Started - Learn core routing concepts
- Router Advanced Usage - Master advanced routing patterns
- Router Monitoring - Track routing in production
- Optimize LLM Costs Tutorial - Complete cost optimization workflow
- Chat Completions Guide - Master the LLM endpoint
- Streaming Guide - Handle SSE responses
Troubleshooting
Issue: Routing always selects the same model
Possible causes:- Candidates list too restrictive
- Request pattern favors one model
- Other models unavailable
- Expand candidate pool
- Check model availability
- Review request characteristics
Issue: High routing latency (>1s)
Possible causes:- Network issues
- Large candidate pool
- Router API congestion
- Reduce candidates to 3-5 models
- Check network connectivity
- Consider fixed models for latency-critical apps
Issue: Unexpected costs
Possible causes:- Router selecting premium models
- High volume of requests
- Long responses
- Use budget-tier candidates
- Limit max_tokens
- Monitor model distribution
- Implement cost alerts