“Stop fixing the input. We spend a lot of time on the input side — bigger models, memory, better prompts. But the model is not deterministic. Even great prompts produce slop on some runs. What if instead of monitoring that, we have a process where we turn it into a number from 0 to 1?” — Lou

Session context: 2026-06-04_Mastermind — Lou opened the session with a live walkthrough of a quality gate system he’d been building, inspired by an article about fixing AI slop using Hermes. He wanted the same capability without a third-party agent.

Core Idea

The standard fix for AI slop is to improve the prompt: add examples, sharpen the instruction, try a bigger model. But this misses the structural problem — the model is non-deterministic. Even a perfect prompt produces weak outputs on some runs, and every model upgrade reshuffles the behavior you were relying on. Optimizing the input can never fully solve an output problem.

Lou’s reframe: build a trained evaluation agent that sits downstream of your writing skills and scores everything before it publishes. The gate doesn’t fix bad work — it identifies it and sends it back. The difference between this and an inline quality rubric (like Insight - The Quality Gate Pattern — Embed 9-10 Self-Evaluation at Every Pipeline Handoff) is structural: instead of a single rubric baked into one skill, this is a separate ambient folder containing an evaluator that any skill can call, trained on your gold standard.

The mechanics:

  1. Collect your gold standard. 20–50 pieces of your best published work — the content you’d want all future output to match. It can be your own work or anybody else’s you’d aspire to.
  2. Extract the rubric. A command reads through the gold standard and derives scoring criteria — not just grammar and structure, but substance: perspective, hook quality, specificity, angle. The rubric captures what made these pieces worth publishing.
  3. Score, don’t rewrite. The gate returns a number (0–1) and itemizes what failed. It never edits the output. The calling skill uses the score to decide whether to rerun or flag for review.
  4. Get harder to fool over time. The gate accumulates edge cases. Every time it runs and learns from a correction, it sharpens its discrimination. The first run is the least accurate it will ever be.

The gate runs across platforms: a LinkedIn post rubric, a newsletter rubric, a thought-leader article rubric. Each content type can have its own 20–50 gold examples and its own derived criteria. When a skill invokes the gate, it passes the content type, and the evaluator applies the right rubric.

Why this matters for scaling: The aspiration is to remove yourself from the production loop as much as possible — to focus on the conversations that generate ideas while automation handles extraction, writing, and publishing. But automation needs a quality floor. Without a gate, anything that looks plausible ships. With a gate, the floor rises with each use, because the gate is also compounding.

Practical Application

Build the eval loop as an ambient folder in your project:

  1. Create a gold-standard/ subfolder and populate it with 20–50 pieces of your best work by content type.
  2. Run a command (or prompt): “Read this collection and generate a scoring rubric that distinguishes what makes these pieces worth publishing from work that doesn’t.”
  3. Create an evaluation agent that accepts any piece of content + a content-type label, runs it against the matching rubric, and returns a score + failure reasons.
  4. After each use, instruct the agent: “Learn from the corrections I made to the output and update the rubric.”

For coaching clients: frame this as “training your AI on what good means to you.” The rubric isn’t a generic best-practice checklist — it’s your taste, encoded. That’s what makes it durable as models change.

Evolution Across Sessions

Builds on Insight - The Quality Gate Pattern — Embed 9-10 Self-Evaluation at Every Pipeline Handoff (2026-04-09), which established the inline self-evaluation pattern. New development: this session externalizes the gate into a separate ambient agent trained on gold-standard examples — a cross-skill quality layer that any skill can call, distinct from per-skill rubrics. Also connects to Insight - The Self-Improving Skill Loop — Have the Skill Learn From Every Use (2026-05-28): both involve loops that compound with each use, but this one operates on the evaluator rather than the writing skill itself.