Technical Guides

Implementing Intelligent LLM Routing in 5 Minutes

Quick tutorial to set up intelligent routing between GPT-4, Claude, and Gemini based on cost, quality, and latency. Copy-paste code included.

Optra AI Team

Engineering

3 min read

Want to reduce your LLM costs by 30-40% with just a few lines of code? Intelligent routing is your answer. In this quick tutorial, we'll show you how to implement a production-ready routing system in less than 5 minutes.

What is Intelligent Routing?

Intelligent routing automatically selects the best LLM model for each request based on:

  • **Complexity**: Simple queries → cheap models, complex queries → premium models
  • **Cost**: Stay within budget while maintaining quality
  • **Latency**: Route to fastest provider when speed matters
  • **Quality**: Ensure minimum quality thresholds are met
  • 5-Minute Implementation

    Step 1: Install Dependencies

    npm install openai @anthropic-ai/sdk @google/generative-ai

    Step 2: Create the Router

    interface RoutingDecision {
      provider: 'openai' | 'anthropic' | 'google'
      model: string
      reasoning: string
      estimatedCost: number
    }
    
    class IntelligentRouter {
      route(prompt: string, options?: RoutingOptions): RoutingDecision {
        const complexity = this.analyzeComplexity(prompt)
    
        // Simple queries → cheap models
        if (complexity < 30) {
          return {
            provider: 'openai',
            model: 'gpt-3.5-turbo',
            reasoning: 'Low complexity, cost-optimized',
            estimatedCost: 0.0002
          }
        }
    
        // Medium complexity → balanced model
        if (complexity < 70) {
          return {
            provider: 'anthropic',
            model: 'claude-3-sonnet',
            reasoning: 'Medium complexity, balanced approach',
            estimatedCost: 0.0018
          }
        }
    
        // High complexity → premium model
        return {
          provider: 'openai',
          model: 'gpt-4-turbo',
          reasoning: 'High complexity, maximum quality',
          estimatedCost: 0.0200
        }
      }
    
      private analyzeComplexity(prompt: string): number {
        let score = 0
    
        // Length factor
        if (prompt.length > 500) score += 30
        else if (prompt.length > 200) score += 15
    
        // Technical keywords
        const technicalKeywords = ['code', 'implement', 'algorithm', 'debug']
        if (technicalKeywords.some(kw => prompt.toLowerCase().includes(kw))) {
          score += 40
        }
    
        // Question complexity
        if (prompt.includes('why') || prompt.includes('how')) score += 20
    
        return Math.min(score, 100)
      }
    }

    Step 3: Use the Router

    const router = new IntelligentRouter()
    
    async function callLLMWithRouting(prompt: string) {
      const decision = router.route(prompt)
    
      console.log(\`Routing to \${decision.model}: \${decision.reasoning}\`)
    
      // Call the selected provider
      switch (decision.provider) {
        case 'openai':
          return callOpenAI(decision.model, prompt)
        case 'anthropic':
          return callAnthropic(decision.model, prompt)
        case 'google':
          return callGoogle(decision.model, prompt)
      }
    }

    Advanced Routing Strategies

    Cost-Based Routing

    route(prompt: string, maxCost: number = 0.01): RoutingDecision {
      const candidates = [
        { model: 'gpt-3.5-turbo', cost: 0.0002, quality: 0.85 },
        { model: 'claude-3-sonnet', cost: 0.0018, quality: 0.92 },
        { model: 'gpt-4-turbo', cost: 0.0200, quality: 0.95 }
      ]
    
      // Filter by budget
      const affordable = candidates.filter(c => c.cost <= maxCost)
    
      // Pick highest quality within budget
      return affordable.sort((a, b) => b.quality - a.quality)[0]
    }

    Latency-Based Routing

    route(prompt: string, maxLatency: number = 500): RoutingDecision {
      const candidates = [
        { model: 'claude-3-haiku', latency: 250 },
        { model: 'gpt-3.5-turbo', latency: 320 },
        { model: 'gpt-4-turbo', latency: 420 }
      ]
    
      return candidates.filter(c => c.latency <= maxLatency)[0]
    }

    Testing Your Router

    // Test with different query types
    const testQueries = [
      "What's the capital of France?",  // Simple → GPT-3.5
      "Explain how neural networks work",  // Medium → Claude Sonnet
      "Write a production-ready authentication system"  // Complex → GPT-4
    ]
    
    for (const query of testQueries) {
      const decision = router.route(query)
      console.log(\`"\${query}" → \${decision.model}\`)
    }

    Common Pitfalls

  • **Over-optimizing for cost**: Don't sacrifice quality for pennies
  • **Ignoring user tier**: Premium users deserve premium models
  • **Not monitoring quality**: Always A/B test your routing rules
  • **Hard-coding rules**: Make routing rules configurable
  • Next Steps

  • Monitor cost savings and quality metrics
  • A/B test different routing strategies
  • Add caching to reduce costs further
  • Implement fallback logic for provider outages
  • Start with this simple implementation and iterate based on your actual usage patterns. Most teams see 30-40% cost reduction immediately.

    routingtutorialcodeoptimization

    How good is your vibe coding really?

    Prism scores every coding session so you finally know how your prompting performs — and how you compare. Works with Claude Code today; Codex and Gemini next.

    Check your Score