“The UI is Claude. The UI is basically a routing and evaluation layer, and it decides whether or not it needs inference, in which case it uses it, or if it needs to spawn a Python script to run the thing.” — Lou

Session context: 2026-04-23_Mastermind — Two parallel stories from Scott Delinger (90 million research records reduced to 128K overnight) and Lou (Gears pipeline from 6–7 hours to 15 minutes) converged on the same architectural principle with the force of double evidence.

Core Idea

AI inference is for judgment. Python — or any deterministic code — is for computation. Treating them as interchangeable burns your AI budget on the wrong problems and produces slower, less reliable results for tasks that don’t need judgment at all.

Scott arrived with 90,900,000 entries needing reduction to a usable 128,000-line dataset. The old workflow would have been weeks in Excel. Instead, he described the data structures to Gemini — in plain English — and the resulting Python script ran overnight on a server in Ontario: 10 hours and 16 minutes, 500 megabytes of RAM at peak. The script sorted, filtered, and aggregated data going back to 1993. “This would not have been possible without the collaboration with Gemini on writing Python.” Gemini wrote the code. The server ran it. No inference tokens spent on the computation itself.

Lou’s structural parallel was the Gears pipeline. Original architecture: content generation, website generation, and ontology processing all routed through Claude inference. Full ingest: 6 to 7 hours — when it completed at all. The fix: restructure so Claude handles only routing and evaluation decisions. Python handles the computational heavy lifting. Claude spawns the Python batch processor at the right moment. Same job: 15 minutes.

The generalizable principle:

  1. Inference for judgment calls — routing, evaluation, ambiguous decisions, edge case reasoning
  2. Code for anything deterministic — sorting, filtering, aggregation, transformation, processing at volume
  3. AI writes the code — you don’t need to know Python, you need to describe the task in plain English

Scripts can even be embedded inside skill definitions — ready-to-run utilities the skill spawns at the right moment. The skill uses AI to decide when to run the script; the script does the work without touching your API budget or subscription limits.

The decision filter is simple: Does this task have a deterministic correct answer? If yes, it belongs in code. If it requires judgment — weighing tradeoffs, handling ambiguity, evaluating quality — that’s where inference earns its price.

Practical Application

When you have a task involving volume, repetition, or deterministic transformation, use this pattern before spending AI credits:

  1. Describe the task in plain English to Claude/Gemini/ChatGPT — including your input structure, output structure, and any edge cases you know about.
  2. Ask it to write the Python (or shell script, or SQL query) that handles the task.
  3. Test the script on a small sample to verify correct output.
  4. Run at full scale — locally, overnight on a server, or via a cloud notebook.
  5. Use AI only for interpreting results or handling edge cases that need judgment.

You don’t need to understand the code. You need to be able to describe the transformation clearly enough that AI can write it. That’s a different skill — and a learnable one.

Evolution Across Sessions

This establishes the baseline for an efficiency architecture principle that has been implicit in prior sessions but never stated as a first-class rule. Insight - Process Architecture Transmits Judgment More Reliably Than Individual Prompts (2026-04-02) addressed keeping AI in the right prompting layer; Insight - CLI-First Micro-Apps — Why the Most Durable Personal Tools Skip the UI (2026-04-02) showed that scripts outlast interfaces. This insight adds the decision rule — the clean test for when to use inference vs. code — backed by two concrete examples with measured outcomes (Scott’s 90M records; Lou’s 6h→15min pipeline). Future sessions should track whether members adopt this pattern and what productivity multipliers they observe.