Tools

Tools: Why I Built an AI Model Router (And Why You Are Probably Overpaying 100x Right Now)

2026-02-14 0 views admin

Tools: Why I Built an AI Model Router (And Why You Are Probably Overpaying 100x Right Now)

Source: Dev.to

The Ferrari Problem ## The Spreadsheet Moment ## The Obvious Fix (That Nobody Builds) ## So I Built It ## The Three Modes ## The Honest Numbers ## What I Learned Building This ## Where We Are Now ## Try It It started with a $47 API bill. Not for anything impressive. Not for training a model or processing a million documents. Just... a chatbot. A customer support bot for a side project that handled maybe 200 conversations a day. $47. For a chatbot that mostly answered "what are your opening hours?" and "how do I reset my password?" I stared at the Anthropic billing dashboard and had that feeling every developer knows — the one where you realize you've been doing something incredibly stupid for weeks and nobody told you. Here's what I was doing: sending every single user query to Claude Opus. The best model. The most expensive model. For everything. "What are your opening hours?" — $0.025 "How do I reset my password?" — $0.025 "Can you architect a distributed event-driven payment system with CQRS and saga patterns?" — $0.025 Same price. Same model. Every time. That's like taking a Ferrari to buy milk. Sure, it gets you there. But a bicycle works just as well and costs nothing. Being an engineer, I did what engineers do. I exported my API logs and categorized every single query from the past month. 209,000 API calls. I went through a representative sample of 2,000 and classified them by actual complexity. The results made me physically uncomfortable: I did the math. I was overpaying by 87%. Not 10%. Not 30%. Eighty-seven percent. The solution seemed obvious: route different queries to different models based on complexity. Simple queries go to cheap models. Complex queries go to expensive ones. So why wasn't everyone doing this? I asked around. Talked to 30+ developers building with AI APIs. The answers were always the same: "It's not worth the engineering time." Fair. Building a routing layer means maintaining a classifier, tracking benchmark data, handling failover logic, managing model configs across providers. That's a side project on top of your side project. "What if the cheap model messes up?" Also fair. Nobody wants to explain to their boss why the AI gave a wrong answer because they tried to save $0.02. "I'll optimize later." The classic. Later never comes. You're busy building features, not optimizing API costs. Meanwhile, the bill keeps growing. I spent three months building what I now call Komilion. The name comes from "chameleon" — because it adapts to whatever you throw at it. The architecture evolved through three iterations: v1: Regex only. I hard-coded patterns. "Translate X to Y" → cheap model. "Summarize" → cheap model. This caught about 40% of simple queries but missed everything that didn't match my patterns. Fragile and annoying to maintain. v2: LLM classifier. I added a cheap LLM (Gemini Flash) to classify queries the regex couldn't catch. This bumped classification accuracy to ~85% but added 200-400ms of latency. For some use cases, that mattered. v3: Hybrid with fast-path. The current version. A regex fast-path catches ~60% of requests with zero added latency (<5ms). The LLM classifier handles the remaining ambiguous cases. Deterministic model selection uses published benchmarks (LMArena ELO scores, Artificial Analysis quality/speed/price indices) rather than trained ML models. Why benchmark-based and not ML-trained? Because I'm one person. Training and maintaining a routing model is a full-time ML engineering job. Benchmarks update automatically when new models launch. Good enough beats perfect-but-unmaintainable. Through testing with early users, three usage patterns emerged: Neo Mode — "Just pick for me." You send a prompt to neo-mode/balanced and Komilion picks the best model for that specific request. Most users start here. Three sub-tiers: frugal (prioritize cost), balanced (cost/quality), premium (prioritize quality). Pinned Mode — "I want this specific model." You lock a specific model for your application. When a newer version drops within the same provider family (e.g., Claude Sonnet 4.5 → 5.0), Komilion auto-upgrades. You get improvements without changing code. Advisor Mode — "Tell me where I'm wasting money." A weekly email analyzing your usage patterns. "You spent $12 on simple queries last week — switching to frugal would save $10.80." One-click to test the recommendation. This is for teams that want to keep control but still optimize. I'm going to be transparent about what Komilion does and doesn't do, because developer trust is everything. 1. Developers are more price-sensitive than they admit. Every dev I talked to said "cost doesn't matter, quality matters." Then I showed them their monthly bill breakdown and they went quiet. Cost matters. It just matters differently — nobody wants to sacrifice quality, but everyone wants to stop overpaying on the 70% of queries that don't need quality. 2. Integration friction kills adoption. My first version required a custom SDK. Zero adoption. The moment I made it OpenAI SDK compatible (literally change base_url), people actually tried it. Lesson: don't ask developers to learn your API. Speak their language. 3. Transparency beats perfection. I show the exact provider cost and routing decision in every API response. Counter-intuitive, but it builds trust. Developers who see exactly what they're paying are more likely to stay than developers who feel tricked. 4. The best marketing is a curl command. No landing page, no demo video, no sales call has ever converted a developer as effectively as: Show them it works. Show them the cost. Let them do the math. Komilion is live. 394 models. Three routing modes. Full OpenAI SDK compatibility. Free credits to try. I'm building this solo, bootstrapped. Total launch cost: ~$150 (domain, hosting, initial API credits). No VC. No team. Just an engineer who got tired of overpaying. If any of this resonated: Or don't. Build your own router using the classifier code I published. The insight — that most AI queries are simple and don't need frontier models — is more important than any specific tool. But if you'd rather just change a base URL and let someone else maintain the routing... I built that too. Questions? Find me on Twitter @haboroshan or email [email protected]. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse COMMAND_BLOCK: curl https://www.komilion.com/api/chat/completions \ -H "Authorization: Bearer sk-komilion-..." \ -H "Content-Type: application/json" \ -d '{"model":"neo-mode/balanced","messages":[{"role":"user","content":"What is the capital of France?"}]}' Enter fullscreen mode Exit fullscreen mode COMMAND_BLOCK: curl https://www.komilion.com/api/chat/completions \ -H "Authorization: Bearer sk-komilion-..." \ -H "Content-Type: application/json" \ -d '{"model":"neo-mode/balanced","messages":[{"role":"user","content":"What is the capital of France?"}]}' COMMAND_BLOCK: curl https://www.komilion.com/api/chat/completions \ -H "Authorization: Bearer sk-komilion-..." \ -H "Content-Type: application/json" \ -d '{"model":"neo-mode/balanced","messages":[{"role":"user","content":"What is the capital of France?"}]}' - 71% were simple tasks. Translations, summaries, Q&A, formatting. A $0.0002 Flash model handles these identically to Opus. - 19% were medium complexity. Code generation, analysis, content writing. A $0.01 Pro model handles these well. - 10% were genuinely complex. Multi-step reasoning, research, architecture. These actually needed a frontier model. - Routes across 394 models from all major providers through one API key - Analyzes each request and picks the right model for the job - Saves 60-90% on simple tasks (which are 70% of most apps' traffic) - OpenAI SDK compatible — change your base URL, not your code - Shows exact cost in every API response (komilion.cost field) - Handles provider failover automatically - Guarantee the absolute cheapest model for every request - Replace your judgment on complex use cases - Work magic if 100% of your queries are complex - Now: Core routing, three modes, dashboard with usage stats - Next: Streaming cost estimation, batch API support, team accounts - Later: Custom routing rules, webhook notifications, dedicated enterprise endpoints - Sign up at komilion.com — free credits, no credit card - Get your API key - Change one line of code - Watch your costs drop

🏷️ Tags

how-totutorialguidedev.toaimlopenaillmroutingrouterswitch