Skip to main content

Model Routing — Match Model to Task

How agents in this template should pick a model for the work in front of them. The wrong choice burns latency or shallows the answer; the right choice does neither.

The model ladder (Claude)

ModelIDSpeedStrengthUse for
Haiku 4.5claude-haiku-4-5-20251001FastestLightPure file-find, "where is X defined", trivial lookups (1 grep, no synthesis)
Sonnet 4.6claude-sonnet-4-6~2–3× faster than OpusBalanced"Tell me about X / how does Y work" — research and synthesis across multiple sources
Opus 4.7claude-opus-4-7SlowestDeepest reasoningCoding, editing, planning, debugging, design decisions

Other CLIs (Codex, Gemini) have analogous tiers — same principle applies even if model names differ.

When to use each

Decide by what the answer is, not by topic.

  • "Summarize what's already there" → search → Sonnet subagent (default), or Haiku subagent if it's a single grep.
  • "Produce a change" (write/edit code, design, debug) → work → main Opus thread.
  • Unsure? Default to Sonnet. The cost of mismatching down (Haiku for a depth question) is a thin answer the user has to re-ask. The cost of mismatching up (Opus for a lookup) is 30–60s of wasted latency.

Why fan-out is the wrong default

Running searches inline on the main Opus thread costs:

  • ~10–20s per tool result while Opus generates the next "thinking" block
  • Multi-step search (3+ tool calls) → 30–60s of wall-clock for a sub-second lookup
  • All search results land in main context, polluting it for the actual work

Dispatching to a subagent (Agent tool with model: "sonnet") sidesteps both: the subagent runs searches at its own faster pace, summarizes, and returns one bounded result to the main thread.

The pattern

Lookup / research / "tell me about X"
→ Agent(subagent_type: "Explore", model: "sonnet", prompt: "<bounded research prompt>")
→ returns summary; main thread synthesizes for the user

Trivial file-find / "where is X defined"
→ Agent(subagent_type: "Explore", model: "haiku", prompt: "<single grep prompt>")
→ returns path/line; main thread cites it

Coding / editing / debugging / design
→ main Opus thread, no dispatch

When dispatching, bound the report: word limit, file:line citations required, no padding. A summary in the main context is the goal — not raw tool output.

Common mistakes

  • Opus for searches. Slow + no extra value. The model isn't smarter at "find this file"; it's just slower at writing the response.
  • Haiku for design questions. Fast but skips connections across files. Sonnet's synthesis is what makes "tell me about vehicles" usable.
  • Fanning out 3+ parallel searches inline. Parallel doesn't help main-thread speed — main thread blocks on the slowest one, then has to write a long summary at Opus speed.
  • Duplicating the subagent's work in the main thread. If you delegated search, trust the summary; don't re-run the searches.

Reading more

  • The agent-side rule lives in user memory: feedback_route_searches_to_sonnet.md (governs default behavior in every Claude Code session).
  • L1 rule reference is in CLAUDE.md / AGENTS.md / GEMINI.md.