“I’m gonna use the cheapest model available, but I’m gonna use the smartest model available to prompt that cheap model so that it performs as well as it can.” — Lou
Session context: 2026-06-11_Mastermind — Lou shared an idea he’d read that morning and unpacked it three times for the group because the inversion is easy to miss.
Core Idea
The instinct is to use a smart model when the work is hard and a cheap model when it’s easy. This flips that. You commit upfront to running the task on the cheap model — say Haiku at high effort — and then you hire the smart model for one job only: write the prompt that lets the cheap model perform like the expensive one.
The reasoning matters. A model like Opus is intimately familiar with the capabilities and constraints of Haiku. So it can write instructions that compensate — telling the cheap model exactly what strategy to use, when to think step-by-step, what to watch for, how much scaffolding it needs. You’re not asking Haiku to reason like Opus on its own. You’re letting Opus pre-compute the reasoning and bake it into the instructions, so Haiku just executes a very detailed plan. Pay once for the intelligence; reuse it on every cheap inference after.
This is distinct from dynamic model routing (where the system decides in real time which tier a step needs). Here the tier is fixed at the bottom on purpose, and the smart model’s role is moved upstream into prompt construction. Reported gains run 20–75% improvement in the cheap model’s performance, and the technique pairs naturally with DSPy-style automatic prompt optimization for another compounding lift.
The deeper principle: reasoning is a cost you can amortize. Most workflows pay the premium-model tax on every single call. If the reasoning is stable, you can spend it once — in the prompt — and then run the volume on the cheapest model that can follow a good plan.
Practical Application
Pick a repetitive task you currently run on a premium model. Then hand the premium model this job instead:
“I’m going to run this task on [Haiku 4.5 / your cheapest viable model]. Knowing that model’s specific capabilities and limits, write a prompt that gets it to perform this task as well as you would — include the strategy, when to think, and anything it’s likely to get wrong without being told.”
Test the cheap model with that generated prompt against your old premium-model output. Where it holds up, you’ve just cut the per-run cost of that task by an order of magnitude. Especially powerful for forked/spawned sub-steps where you already specify the child’s model — optimize the prompt for that tier and you stack the savings.
Related Insights
- Insight - The Model Underneath Is the Multiplier, Not the Interface — the model still matters; this just relocates the smart model’s contribution into the prompt.
- Insight - Code Is for Computation, Inference Is for Judgment — same instinct (deploy expensive resources only where they earn their cost), applied to inference tiers.
- Insight - Model Altitude — Route Model and Effort by Workflow Step, Not by Whole Artifact — routing decides the tier per step; this technique then squeezes the chosen tier harder.
- Insight - The DSPy Cognitive Mirror — Teaching AI to Replicate Your Decision-Making — DSPy auto-optimizes prompts; combining it with cross-tier prompting was reported to produce “breathtaking” results.
Evolution Across Sessions
A counter-move to the assumption baked into Insight - The Model Underneath Is the Multiplier, Not the Interface (2025-08-07) — that you reach for the strongest model when the work is hard. Here the work stays on the cheapest model, and the strongest model is repurposed as a one-time prompt author. Establishes a baseline for “amortized reasoning” as a cost-control pattern; future sessions should test where Sonnet finally beats Haiku-with-a-great-prompt for a given task.