Case Studies

Case Study: How Acme Corp Saved $50,000/Month on AI Costs

Detailed case study of how one company reduced LLM spending from $80K to $30K monthly through intelligent routing and caching. Includes timeline, challenges, and results.

ReForge Team

Engineering

7 min read

When Acme Corp's AI costs hit $80,000/month and showed no signs of slowing, they knew they had a problem. Six months later, they're spending $30,000/month with better quality and happier users.

Here's exactly how they did it.

Company Background

**Acme Corp** (name changed for privacy):

  • B2B SaaS platform for customer support
  • Series A funded, $15M ARR
  • 15 engineers, 3 focused on AI
  • AI-powered chat and ticket routing
  • Growing 25% MoM
  • The Problem

    The Wake-Up Call

    **October 2023**: Engineering lead opens monthly AWS bill:

  • **$82,000** in OpenAI API costs
  • Up from $45,000 the month before
  • Projected to hit $120,000+ by end of quarter
  • **The math didn't work**:

    Current: $80K/month AI costs
    Revenue per customer: $500/month
    Number of customers: 300
    Monthly Revenue: $150K
    
    AI costs = 53% of revenue!

    Root Cause Analysis

    After one week of instrumentation and analysis:

    // Their usage breakdown
    const breakdown = {
      customerChat: {
        requests: 450000,  // 450K requests/month
        model: 'gpt-4',    // Using GPT-4 for everything
        avgCost: 0.12,     // $0.12 per conversation
        totalCost: 54000   // $54K/month
      },
      ticketRouting: {
        requests: 180000,  // 180K requests/month
        model: 'gpt-4',    // Even for simple classification!
        avgCost: 0.08,
        totalCost: 14400   // $14.4K/month
      },
      summaryGeneration: {
        requests: 90000,   // 90K requests/month
        model: 'gpt-4',
        avgCost: 0.15,
        totalCost: 13500   // $13.5K/month
      }
    }
    
    console.log(\`Total: $\${(54000 + 14400 + 13500).toFixed(2)}\`)
    // $81,900/month

    **Problems identified**:

  • Using GPT-4 for everything (even simple tasks)
  • No caching whatsoever
  • Long conversation histories (10K+ tokens)
  • No rate limiting (users could spam)
  • No budget enforcement
  • The Solution

    Phase 1: Quick Wins (Week 1)

    **Implemented**:

  • Exact-match caching for FAQ queries
  • Conversation history trimming (max 4K tokens)
  • Rate limiting (100 requests/hour per user)
  • Max token limits on responses
  • // Before
    const response = await openai.chat.completions.create({
      model: 'gpt-4',
      messages: conversationHistory  // Unlimited history!
    })
    
    // After
    const response = await openai.chat.completions.create({
      model: 'gpt-4',
      messages: trimHistory(conversationHistory, 4000),  // Limit history
      max_tokens: 500  // Limit response length
    })

    **Results**: **$80K → $62K/month** (22% reduction)

    Phase 2: Intelligent Routing (Week 2-3)

    **Implemented**:

    class AcmeRouter {
      route(prompt: string, context: Context) {
        // Simple ticket routing → GPT-3.5
        if (context.feature === 'ticket_routing') {
          return {
            model: 'gpt-3.5-turbo',
            reasoning: 'Simple classification task',
            estimatedCost: 0.002
          }
        }
    
        // FAQ queries → GPT-3.5
        if (this.isFAQ(prompt)) {
          return {
            model: 'gpt-3.5-turbo',
            reasoning: 'Common question pattern',
            estimatedCost: 0.003
          }
        }
    
        // Complex support → GPT-4
        if (this.isComplexIssue(prompt)) {
          return {
            model: 'gpt-4-turbo',
            reasoning: 'Complex technical issue',
            estimatedCost: 0.12
          }
        }
    
        // Default: Claude 3 Sonnet (balanced)
        return {
          model: 'claude-3-sonnet',
          reasoning: 'Standard support query',
          estimatedCost: 0.04
        }
      }
    
      private isFAQ(prompt: string): boolean {
        const faqPatterns = [
          /how (do|can) i/i,
          /what is/i,
          /where (is|can)/i,
          /reset password/i,
          /pricing/i
        ]
        return faqPatterns.some(pattern => pattern.test(prompt))
      }
    
      private isComplexIssue(prompt: string): boolean {
        return prompt.length > 500 ||
               prompt.includes('bug') ||
               prompt.includes('error') ||
               prompt.includes('not working')
      }
    }

    **Traffic distribution after routing**:

  • 40% → GPT-3.5 (was 0%)
  • 30% → Claude 3 Sonnet (was 0%)
  • 30% → GPT-4 (was 100%)
  • **Results**: **$62K → $42K/month** (32% additional reduction)

    Phase 3: Semantic Caching (Week 4)

    **Implemented**:

    import { SemanticCache } from '@reforge/semantic-cache'
    
    const cache = new SemanticCache({
      similarityThreshold: 0.92,  // Strict threshold for accuracy
      ttl: 7200,  // 2 hours (support tickets change)
      embeddingModel: 'text-embedding-ada-002'
    })
    
    async function handleSupportQuery(query: string) {
      // Check semantic cache first
      const cached = await cache.get(query)
      if (cached) {
        console.log('Cache HIT - $0 cost')
        return cached
      }
    
      // Cache miss - call LLM
      const routing = await router.route(query, { feature: 'chat' })
      const response = await callLLM(routing.model, query)
    
      // Cache for future similar queries
      await cache.set(query, response, 7200)
    
      return response
    }

    **Cache performance**:

  • Cache hit rate: 58%
  • Avg similarity score on hits: 0.94
  • Estimated savings from cache: $18K/month
  • **Results**: **$42K → $30K/month** (28% additional reduction)

    The Challenges

    Challenge 1: Quality Concerns

    **Problem**: Team worried about quality degradation with cheaper models.

    **Solution**: A/B testing with quality metrics

    // Ran A/B test for 2 weeks
    const results = {
      control: {  // GPT-4 only
        userSatisfaction: 0.87,
        resolutionRate: 0.82,
        avgCost: 0.12
      },
      treatment: {  // Intelligent routing
        userSatisfaction: 0.86,  // -1% (not statistically significant)
        resolutionRate: 0.83,  // +1%!
        avgCost: 0.046  // -62% cost
      }
    }

    **Outcome**: Quality maintained, team bought in.

    Challenge 2: Cache Accuracy

    **Problem**: Early cache had 92% hit rate but 12% error rate (wrong answers).

    **Solution**: Increased similarity threshold from 0.85 to 0.92

    // Before: High hit rate, poor accuracy
    const cache = new SemanticCache({ similarityThreshold: 0.85 })
    // Hit rate: 72%, Error rate: 12%
    
    // After: Lower hit rate, excellent accuracy
    const cache = new SemanticCache({ similarityThreshold: 0.92 })
    // Hit rate: 58%, Error rate: 2%

    **Outcome**: Accepted lower hit rate for better accuracy.

    Challenge 3: Team Adoption

    **Problem**: Engineers resistant to adding complexity.

    **Solution**: Made it optional, showed data

    // Made routing opt-in initially
    async function handleQuery(query: string, options?: { useRouting?: boolean }) {
      if (options?.useRouting) {
        const routing = await router.route(query)
        return callLLM(routing.model, query)
      }
    
      // Fallback to GPT-4
      return callLLM('gpt-4', query)
    }
    
    // After seeing results, made it default

    **Outcome**: 100% adoption after seeing savings.

    The Results

    Cost Reduction

    | Metric | Before | After | Change |

    |--------|--------|-------|--------|

    | Monthly Cost | $80,000 | $30,000 | **-62%** |

    | Cost per Request | $0.11 | $0.04 | **-64%** |

    | GPT-4 Usage | 100% | 30% | -70% |

    | Cache Hit Rate | 0% | 58% | +58% |

    Quality Metrics

    | Metric | Before | After | Change |

    |--------|--------|-------|--------|

    | User Satisfaction | 87% | 86% | -1% (not significant) |

    | Resolution Rate | 82% | 83% | **+1%** |

    | Avg Response Time | 2.1s | 1.8s | **-14%** (faster!) |

    | Error Rate | 3.2% | 3.1% | No change |

    Business Impact

    **Annual savings**: **$600,000**

    **ROI calculation**:

    const roi = {
      implementation: {
        engineeringTime: '4 weeks × 2 engineers',
        cost: 40000  // Loaded cost
      },
      platformFee: {
        monthly: 499,
        annual: 5988
      },
      savings: {
        monthly: 50000,
        annual: 600000
      },
      netAnnualSavings: 600000 - 40000 - 5988,  // $554,012
      roi: ((600000 - 40000 - 5988) / (40000 + 5988)) * 100  // 1,205% ROI
    }
    
    console.log(\`ROI: \${roi.roi.toFixed(0)}%\`)
    // 1,205% ROI

    Timeline

    **Week 1**:

  • Instrumentation & measurement
  • Quick wins (caching, limits)
  • Savings: $18K/month
  • **Week 2**:

  • Build routing engine
  • A/B test routing logic
  • Savings: +$20K/month
  • **Week 3**:

  • Gradual rollout (20% → 50% → 100%)
  • Monitor quality metrics
  • Savings: Confirmed
  • **Week 4**:

  • Implement semantic caching
  • Tune similarity threshold
  • Savings: +$12K/month
  • **Total**: **4 weeks, $50K/month savings**

    Lessons Learned

    What Worked

  • **Start with measurement** - Can't optimize what you don't measure
  • **Quick wins build momentum** - 20% savings in week 1 got buy-in
  • **A/B test everything** - Data beats opinions
  • **Gradual rollout** - Caught issues early
  • **Make it transparent** - Dashboard showing savings = team support
  • What Didn't Work

  • **Too aggressive caching** - Started with 0.85 threshold, had to increase
  • **Complex routing rules** - Simplified from 15 rules to 4
  • **No monitoring initially** - Added alerts after first month
  • Recommendations

    **For similar companies**:

  • **Week 1**: Instrument and measure
  • **Week 2**: Implement quick wins
  • **Week 3**: Build routing logic
  • **Week 4**: Add caching
  • **Don't**:

  • Don't sacrifice quality for cost
  • Don't roll out 100% immediately
  • Don't skip A/B testing
  • Don't forget to monitor
  • What's Next

    Acme is continuing to optimize:

    **Q2 2024**:

  • Provider diversification (add Gemini, Cohere)
  • Advanced caching strategies
  • ML-based routing (trained on historical data)
  • Target: $25K/month
  • **Future**:

  • Multimodal support (audio transcription)
  • Customer-specific routing
  • Predictive cost modeling
  • Conclusion

    **$80K → $30K/month in 4 weeks** is achievable with:

  • Intelligent routing
  • Semantic caching
  • Basic optimizations (history trimming, token limits)
  • **Key metrics**:

  • **62% cost reduction**
  • **Quality maintained** (even slightly improved)
  • **1,205% ROI**
  • **$600K annual savings**
  • If you're spending $10K+/month on LLMs, you're probably overpaying. Start measuring today.


    *Interested in similar results? [Try ReForge LLM free →](https://reforgellm.com/signup)*

    case-studycost-reductionreal-worldsuccess-story

    How good is your vibe coding really?

    Prism scores every coding session so you finally know how your prompting performs — and how you compare. Works with Claude Code today; Codex and Gemini next.

    Check your Score