Skip to main content

Optimize LLM Costs with Smart Routing

Build a cost-effective chatbot application using Eden AI’s smart routing to automatically select the best model for each query while minimizing expenses.

What You’ll Build

By the end of this tutorial, you’ll have:
  • Smart routing-powered chatbot - Automatically selects optimal models
  • Multi-tier routing strategy - Budget/balanced/premium tiers for different use cases
  • Cost tracking system - Monitor spending per conversation and query type
  • A/B testing framework - Compare smart routing vs. fixed models
  • Performance metrics - Track latency, quality, and cost trade-offs

Prerequisites

  • Python 3.8 or higher
  • Eden AI API key (Get one here)
  • Basic understanding of LLMs and REST APIs
  • Optional: Database for persistent storage (SQLite/PostgreSQL)

Problem Statement

You’re building a customer support chatbot with diverse query types:
  • Simple FAQs - “What are your hours?” (low complexity)
  • Technical support - “How do I configure SSL?” (medium complexity)
  • Complex troubleshooting - “Server crashes with error X” (high complexity)
Challenges:
  1. Fixed models are inefficient - Using GPT-4o for all queries wastes money on simple FAQs
  2. Manual model selection is hard - Predicting which model fits each query is complex
  3. Quality vs. cost trade-off - Balancing response quality with budget constraints
Solution: Smart routing automatically selects the right model for each query, optimizing costs without sacrificing quality.

Architecture Overview

┌─────────────────────────────────────────────────────────┐
│               Customer Support Chatbot                   │
│                                                          │
│  ┌────────────┐   ┌──────────────┐   ┌──────────────┐  │
│  │  Query     │──▶│  Routing     │──▶│  Response    │  │
│  │  Analyzer  │   │  Engine      │   │  Generator   │  │
│  └────────────┘   └──────────────┘   └──────────────┘  │
│                           │                              │
│                           ▼                              │
│                  ┌─────────────────┐                     │
│                  │  Cost Tracker   │                     │
│                  └─────────────────┘                     │
└─────────────────────────────────────────────────────────┘


              ┌────────────────────────┐
              │  Eden AI Smart Router  │
              │  (@edenai routing)     │
              └────────────────────────┘

Step 1: Baseline Implementation (Fixed Model)

First, let’s build a simple chatbot using a single fixed model to establish a baseline: Expected Output:
User: What are your business hours?
Assistant: Our business hours are Monday through Friday, 9 AM to 5 PM EST...
Cost: $0.0015 | Latency: 1.2s

Total cost: $0.0123
Avg cost per query: $0.0041
Problem: Simple queries like “What are your business hours?” cost the same as complex troubleshooting questions, wasting money.

Step 2: Add Smart Routing

Now let’s migrate to smart routing with default model selection: Expected Output:
User: What are your business hours?
Model: google/gemini-2.0-flash
Cost: $0.0002 | Latency: 1.4s

User: How do I reset my password?
Model: openai/gpt-4o-mini
Cost: $0.0008 | Latency: 1.5s

User: My server is returning 500 errors...
Model: openai/gpt-4o
Cost: $0.0035 | Latency: 1.8s

Total cost: $0.0045 (vs. $0.0123 baseline = 63% savings!)
Model distribution: {'google/gemini-2.0-flash': 1, 'openai/gpt-4o-mini': 1, 'openai/gpt-4o': 1}
Result: 60%+ cost savings by routing simple queries to cheaper models!

Step 3: Implement Multi-Tier Routing Strategy

Create different routing strategies for various use cases:

Step 4: Build A/B Testing Framework

Compare smart routing vs. fixed models: Expected Results:
Variant A (Fixed: GPT-4o):
  Avg cost: $0.0042
  Avg quality: 0.85

Variant B (Smart Routing):
  Avg cost: $0.0016
  Avg quality: 0.84

Cost savings: 61.9%
Quality change: -1.2%

Recommendation: B (Smart Routing)

Step 5: Production Deployment Best Practices

Monitoring and Alerting

Key Takeaways

Cost Savings Summary

StrategyAvg Cost per QuerySavings vs. Baseline
Baseline (Fixed GPT-4o)$0.0041-
Smart Routing (Default)$0.001856%
Smart Routing (Budget Tier)$0.000880%
Smart Routing (Balanced)$0.001563%

Best Practices

Start simple - Begin with default smart routing, then optimize ✅ Monitor metrics - Track cost, latency, and quality ✅ Use tiered strategies - Different tiers for different use cases ✅ A/B test - Validate cost savings don’t hurt quality ✅ Set budgets - Alert before overspending ✅ Log routing decisions - Debug and optimize over time

When Smart Routing Shines

  • Diverse query types - Mix of simple and complex queries
  • Cost-sensitive applications - Budget constraints
  • High volume - Many requests per day
  • Unpredictable workloads - Query complexity varies

When to Use Fixed Models

  • Consistent requirements - All queries need same model
  • Latency-critical - Can’t afford 100-500ms routing overhead
  • Specific model features - Need particular model’s capabilities
  • Already optimized - You’ve manually tuned model selection

Next Steps

Additional Resources

Try it yourself and see 50%+ cost savings!