Topic
How to stop AI slop from escaping your pipeline by building a gold-standard-trained evaluation agent that scores outputs before they publish — and gets smarter every time it runs.
Target Reader
Knowledge entrepreneurs and coaches who are already producing AI-assisted content regularly but keep hitting the “I can’t quite trust this output” wall — and are still manually reviewing every piece because they haven’t built a structural quality floor.
The Fear / Frustration / Want / Aspiration
Fear: Embarrassing output ships under your name and damages your reputation as a thought leader. Frustration: Even carefully crafted prompts produce slop on some runs, and every model update reshuffles the behavior you relied on. Want: A workflow that enforces standards automatically, not because you remembered to check. Aspiration: Fully automated publishing with zero quality compromise — focus only on the thinking, let the pipeline handle everything else.
Before State
Manually reviewing every AI output before publishing. Inconsistent standards depending on how tired you are, how much time you have, what you last read. Publishing things that are “probably fine” with a vague unease. The quality of your output is capped by the quality of your attention in the moment.
After State
Every output passes through a trained evaluator that knows your gold standard. Below-threshold work reruns automatically. What reaches you for final review is already at the level you’d be proud to publish. The quality floor rises with every session — the evaluator learns.
Narrative Arc
The obsession with better prompts is understandable — it’s the obvious lever. But it’s the wrong level. The model is non-deterministic; inputs can only constrain the distribution, not guarantee the output. The turn is realizing that output quality is an architecture problem, not a prompting problem. The resolution is an external evaluator trained on examples you’ve already declared good — and a loop that compounds it.
Core Argument
The path to consistent AI output quality is not better inputs — it’s a trained downstream gate that knows what good looks like in your voice, for your platforms, applied to every piece before it ships.
Key Evidence / Examples
- Direct quote from Lou: “Stop fixing the input. The model is not deterministic. Even great prompts produce slop on some runs.”
- Live demo: article failed the gate on first draft, returned a score below threshold, reruns with the writing agent, produces a second draft
- Gate evaluates substance (perspective, hook, angle, specificity), not just grammar
- Related principle: Insight - The Self-Improving Skill Loop — Have the Skill Learn From Every Use — the same compounding logic applied to the evaluator
Proposed Structure (5–7 beats)
- The problem: why you’re still manually reviewing everything (non-determinism, prompt fragility, model churn)
- The reframe: output quality is an architecture problem, not a prompt problem
- The gold standard as the seed: collect 20–50 pieces of content you’d want to be known for
- The gate mechanics: extraction → rubric → score → block → rerun
- The compounding effect: every use sharpens the discriminator; the gate gets harder to fool
- Multiple rubrics for multiple platforms: one gold standard per content type
- What this unlocks: trust in automation — the floor rises, the pipeline runs, you stay in the conversations
Related Insights
- Insight - The Output Quality Gate — Train a Rubric, Not a Prompt
- Insight - The Quality Gate Pattern — Embed 9-10 Self-Evaluation at Every Pipeline Handoff
- Insight - The Self-Improving Skill Loop — Have the Skill Learn From Every Use
- Insight - Authentic AI Voice Is Built on Lived Experience, Not Style Prompts
Editorial Notes
This has maximum resonance with members who are already automating their content pipeline and have hit the quality-trust ceiling. Contrast strongly with the “just improve your prompt” advice that dominates social media AI content. The gold-standard collection exercise is the key call to action — simple, immediate, and the rest flows from it. Avoid positioning this as a technical build project; frame it as “training AI on what good means to you.”
Next Step
- Approved for drafting
- Needs revision
- Deprioritised