
If you're firing every request at Opus, you're hiring a freight truck to mail a letter. I ran 100 API calls per model tier on a Mac Mini cluster and the numbers made the decision obvious — once you see the latency and cost gap side by side, manual model selection feels almost irresponsible.
This post is for developers using Claude Code or the Claude API directly who want a systematic way to route tasks to the right model tier. The payoff: I cut monthly API spend by 38% without sacrificing output quality on any task type.
The Problem: One Model for Everything Is Expensive by Design
The default instinct is to reach for the most capable model. Opus is impressive, so it gets used everywhere — summaries, one-liners, refactors, architecture docs. The bill arrives at the end of the month and you wonder where it went.
Here's what the raw numbers look like from 100 API calls per tier on identical hardware:
| Model | Avg Response Time | Relative Cost |
|---|---|---|
| Haiku | 1.2s | 1× |
| Sonnet | 3.8s | ~5× |
| Opus | 8.4s | ~15× |
Running a simple README summary through Opus is the API equivalent of renting an excavator to move one shovelful of dirt. The machine can do it — it's just wildly overkill.
The first thing I tried was just using claude-sonnet-4-5 for everything as a compromise. Better than Opus for cost, but still wasteful for tasks that genuinely only need Haiku — and under-powered occasionally on deep architecture work.
The Fix: Map Task Types to Model Tiers
The core insight is that task complexity is predictable. You almost always know before you make the call whether it's a light read, a medium refactor, or a deep reasoning job. That predictability is the lever.
Here's the mapping that works for me:
| Task Type | Model | Why |
|---|---|---|
| File summarization, grep output, translation | Haiku | No reasoning depth needed |
| Code refactoring, function design, mid-level bugs | Sonnet | Balance of speed + quality |
| Multi-step architecture, security audits, research | Opus | Context depth required |
The simplest implementation is a shell script that branches on a TASK_TYPE argument:
#!/bin/bash
TASK_TYPE=$1
if [ "$TASK_TYPE" = "summary" ]; then
MODEL="claude-haiku-4-5"
elif [ "$TASK_TYPE" = "refactor" ]; then
MODEL="claude-sonnet-4-5"
else
MODEL="claude-opus-4-5"
fi
echo "Selected model: $MODEL"
claude --model "$MODEL" --print "$2"
Call it like this:
# Light task
./route.sh summary "Summarize README in three sentences"
# Medium task
./route.sh refactor "Separate functions in auth.py"
# Heavy task
./route.sh "" "Design a multi-agent orchestration system"
These three branches cover 80% of real-world cases. What handles the remaining 20% is where it gets more interesting.
Making It Automatic: Routing Rules in Config
Manual model selection every time defeats the purpose of automation. The better approach is to declare your routing rules once in CLAUDE.md or .claude/settings.json and let Claude Code read context and self-select.
Here's the config I'm running in my n8n 2.8.4 environment:
{
"defaultModel": "claude-sonnet-4-5",
"taskModelOverrides": {
"summarize": "claude-haiku-4-5",
"architecture": "claude-opus-4-5",
"refactor": "claude-sonnet-4-5"
}
}
The defaultModel is Sonnet — that's your safe fallback for anything unclassified. The overrides kick in when Claude Code can infer the task type from context or when you tag the task explicitly.
After deploying this config, monthly API cost dropped 38% compared to the Opus-heavy baseline. Response speed actually improved — Haiku is noticeably snappier on the tasks it now handles.
Variations and Gotchas
The classification edge case. The trickiest part is tasks that look simple but aren't. A "code review" on a 30-line utility function is Sonnet territory. A "code review" on a multi-service authentication flow with security implications is Opus territory. Don't classify by task name alone — factor in scope and depth.
Haiku has a context ceiling. If your prompt includes a large file or long conversation history, Haiku can miss nuance. I use a token count threshold: if the prompt + context exceeds ~4K tokens and the task involves any judgment, I bump to Sonnet regardless of task type.
# Rough token check before routing
PROMPT="$1"
TOKEN_ESTIMATE=$(echo "$PROMPT" | wc -w)
if [ "$TOKEN_ESTIMATE" -gt 3000 ]; then
MODEL="claude-sonnet-4-5"
echo "Context too large for Haiku, upgrading to Sonnet"
fi
Platform differences. On macOS with Claude Code CLI, the --model flag works cleanly in scripts. In Docker or CI environments, set the model via environment variable instead to avoid quoting issues:
# Docker / CI pattern
export ANTHROPIC_MODEL="claude-sonnet-4-5"
claude --print "Your task here"
Don't over-classify. I've seen setups with 8-10 task categories that route to 3 models. The cognitive overhead of maintaining that mapping cancels the savings. Stick to three buckets — light, medium, heavy — and only add granularity when a specific task type is clearly misfiring.
Sonnet as the default is safer than Haiku. If you're not sure, bias toward Sonnet. The cost difference from Haiku is noticeable but not dramatic, and the quality gap when Haiku underperforms is more painful than the bill.
Closing
Model selection is an engineering decision, not a preference. Haiku, Sonnet, and Opus are different-weight fighters — putting them in the same ring for every bout wastes the lighter ones and overtaxes budgets when the heavy ones show up unnecessarily.
Lock in a routing config once. You'll notice the bill change by next month.
Next step worth exploring: dynamic routing based on live token count and prompt complexity scoring, rather than static task-type labels. That's where you'd squeeze another 10–15% out of an already-optimized setup.
🐦 Faster updates on X: @baegseungh7061
📚 More in this series: Code Practical
💌 Subscribe: Follow on X or grab the RSS
댓글
댓글 쓰기