“At the end of each significant function before it’s complete and handed off, I want you to evaluate what a quality rubric would be to determine that it passes that bar of quality. Revise up to 3 times, or stop after 3 revisions, or when it reaches a score of 9 out of 10.” — Lou

Session context: 2026-04-09_Mastermind — Lou added quality gates mid-demo while building the Brand Writing Team skill, injecting the pattern as a real-time skill requirement rather than retrofitting it afterward.

Core Idea

Multi-stage AI pipelines have a quality propagation problem: a weak output from stage 1 gets handed to stage 2, which builds on it, hands a slightly-worse output to stage 3, and so on. By the time you reach the final output, you may have a polished-looking result sitting on a shaky foundation. The final stage can’t fix what the earlier stage got wrong.

Lou’s Quality Gate Pattern addresses this at the architectural level, not the output level. The principle: build a self-evaluation rubric into every major pipeline handoff. Before a role or stage finishes and passes its output to the next, it must:

  1. Generate an evaluation rubric appropriate to the quality standard for that stage
  2. Score its own output against that rubric
  3. Revise the output if it doesn’t meet threshold — up to 3 revision cycles
  4. Only hand off when it reaches a 9/10, or after 3 attempts (whichever comes first)

The threshold is deliberately high: 9 out of 10. Not “good enough,” not “acceptable” — 9/10. This matters because each stage’s bar compounds with the next. A pipeline where every stage passes at 7/10 produces a final output that’s the product of several 7s. A pipeline where every stage is required to reach 9/10 arrives at the final stage with genuinely strong material to work with.

Why self-generated rubrics (not fixed rubrics): Lou specified that the skill should generate the evaluation criteria for each stage, not have them pre-written. This is more robust: the model applies appropriate standards for what each stage is actually doing — researcher evaluates for accuracy and coverage, outliner for structure and flow, writer for voice and persuasion — rather than being forced through a generic checklist.

The 3-attempt ceiling: The ceiling prevents infinite loops and signals when something is structurally broken rather than iteratively fixable. If a stage can’t reach 9/10 in 3 attempts, the issue likely requires human input, not another revision cycle. The quality gate surfaces that signal rather than masking it with a mediocre output.

Demonstrated in real time: During the session, Lou watched the Brand Writing Team skill run with quality gates active. The transcript notes: “Strategist scored a 9. Outliner scored 9. Skeptic scored 9.” Each stage self-evaluated, met threshold, and passed a clean handoff to the next. The article that emerged was substantively better than typical AI output — incorporating psychological research, cited sources, and an unusual angle — precisely because each stage handed its best work to the next.

Practical Application

The quality gate template (add to any skill file, at each stage):

Before handing off to the next stage, evaluate this output against a quality rubric appropriate for [this stage's function]. Score it out of 10. If below 9, revise. Repeat up to 3 times. Only proceed when score reaches 9/10 or 3 revision attempts are exhausted.

Where to place gates: At every major handoff — researcher to outliner, outliner to drafter, drafter to editor, editor to final delivery. You don’t need a gate between every micro-step, but every role transition should have one.

Coaching application: Clients building their first multi-stage skills typically omit quality gates because they seem like overhead. The quick way to demonstrate the value: show them the output difference between a 5-stage pipeline without gates and the same pipeline with gates at stages 2 and 4. The delta is visible and persuasive.

Evolution Across Sessions

This establishes the baseline for the Quality Gate Pattern as a named, reusable design element for multi-stage AI skills. Prior sessions have covered pipeline architecture (Insight - Skill Chaining — Build Modular AI Pipelines Instead of Monolithic Prompts) and quality via model diversity (Insight - Multi-Model Debate as a Quality Control System for High-Stakes Work). This insight adds a third quality mechanism: in-pipeline self-evaluation with a high bar and revision ceiling. Future sessions should test whether members adopt quality gates in their own skills and whether the 9/10 threshold is calibrated correctly across different domains.