Model Routing — Match Model to Task

How agents in this template should pick a model for the work in front of them. The wrong choice burns latency or shallows the answer; the right choice does neither.

The model ladder (Claude)

Model	ID	Speed	Strength	Use for
Haiku 4.5	`claude-haiku-4-5-20251001`	Fastest	Light	Pure file-find, "where is X defined", trivial lookups (1 grep, no synthesis)
Sonnet 4.6	`claude-sonnet-4-6`	~2–3× faster than Opus	Balanced	"Tell me about X / how does Y work" — research and synthesis across multiple sources
Opus 4.7	`claude-opus-4-7`	Slowest	Deepest reasoning	Coding, editing, planning, debugging, design decisions

Other CLIs (Codex, Gemini) have analogous tiers — same principle applies even if model names differ.

When to use each

Decide by what the answer is, not by topic.

"Summarize what's already there" → search → Sonnet subagent (default), or Haiku subagent if it's a single grep.
"Produce a change" (write/edit code, design, debug) → work → main Opus thread.
Unsure? Default to Sonnet. The cost of mismatching down (Haiku for a depth question) is a thin answer the user has to re-ask. The cost of mismatching up (Opus for a lookup) is 30–60s of wasted latency.

Why fan-out is the wrong default

Running searches inline on the main Opus thread costs:

~10–20s per tool result while Opus generates the next "thinking" block
Multi-step search (3+ tool calls) → 30–60s of wall-clock for a sub-second lookup
All search results land in main context, polluting it for the actual work

Dispatching to a subagent (Agent tool with model: "sonnet") sidesteps both: the subagent runs searches at its own faster pace, summarizes, and returns one bounded result to the main thread.

The pattern

Lookup / research / "tell me about X"
  → Agent(subagent_type: "Explore", model: "sonnet", prompt: "<bounded research prompt>")
  → returns summary; main thread synthesizes for the user

Trivial file-find / "where is X defined"
  → Agent(subagent_type: "Explore", model: "haiku", prompt: "<single grep prompt>")
  → returns path/line; main thread cites it

Coding / editing / debugging / design
  → main Opus thread, no dispatch

When dispatching, bound the report: word limit, file:line citations required, no padding. A summary in the main context is the goal — not raw tool output.

Common mistakes

Opus for searches. Slow + no extra value. The model isn't smarter at "find this file"; it's just slower at writing the response.
Haiku for design questions. Fast but skips connections across files. Sonnet's synthesis is what makes "tell me about vehicles" usable.
Fanning out 3+ parallel searches inline. Parallel doesn't help main-thread speed — main thread blocks on the slowest one, then has to write a long summary at Opus speed.
Duplicating the subagent's work in the main thread. If you delegated search, trust the summary; don't re-run the searches.

Reading more

The agent-side rule lives in user memory: feedback_route_searches_to_sonnet.md (governs default behavior in every Claude Code session).
L1 rule reference is in CLAUDE.md / AGENTS.md / GEMINI.md.

The model ladder (Claude)​

When to use each​

Why fan-out is the wrong default​

The pattern​

Common mistakes​

Reading more​