Pipeline Quality Assurance
Over twenty quality control mechanisms, applied between every step.
This page is about quality control mechanisms (QCMs) built into our pipelines. For task-level QA practices when you are using AI on your own work, see the AI Quality Assurance guide.
Our QCM library
We pull from 20+ documented QCMs across five categories. The right mix depends on the pipeline, the stakes, and the cost tolerance.
Deterministic validation
Zero cost
Rule-based checks that run in milliseconds. No AI call. Catch the most common failures cheaply.
- Schema validator. Output has the right fields, types, and structure.
- Format validator. Output matches expected shape, sections, heading hierarchy.
- Range validator. Numeric fields fall within acceptable bounds.
- Uniqueness check. No duplicate IDs or collisions in enumerated fields.
- Substring verifier. Extracted quotes actually exist in the source material.
- Identifier scanner. Flags personally identifiable information before it leaves local infrastructure.
Semantic judging
Per-call LLM cost
An independent AI model reviews output against a rubric or against source material.
- Rubric-scored judge. AI evaluates output against specific quality criteria, returns a 0-1 score.
- Hallucination detection. Claims in output are cross-checked against source material.
- Evidence check. Every cited claim actually traces back to grounding evidence.
- Gap detection. Flags content that should be present but is missing.
- Consistency check. Same concept or code is applied the same way across items.
- Tone check. Output stays on-brand and voice-consistent.
Variant selection
Multi-call LLM cost
Generate multiple candidates, a judge picks the best. Worth the cost when variant quality varies.
- Variant tournament. Generate 3-5 variants of a step, judge scores each, best wins.
- Consensus filter. Three independent judges vote; majority decides pass/fail.
- Comparative A/B. Compare revised output against the original, keep the better one.
Iterative loops
Bounded retries
When a gate fails, auto-fix and retry. Capped at two or three iterations to prevent runaway cost.
- Self-repair cycle. Failed output auto-fixes based on the specific gate that failed, then retries.
- Critique and revise. One AI critiques, a second AI applies the fixes with the critique as context.
- Quality loop. Iterative scoring and revision until the threshold is met or iteration cap is hit.
Context-aware routing
Varies
Apply the right intensity of QA to the right kind of work. Not every gate needs every check.
- Sampling gate. For trusted high-volume pipelines, run QA on a statistical sample.
- Adaptive threshold. Higher-stakes outputs require higher scores to pass.
- Human-in-loop. Borderline scores (0.75-0.84) route to a human review queue; auto-block below that.
An example gate stack
A single AI step in a qualitative coding pipeline, with the quality gates that run after it. Most gates here are zero-cost deterministic checks. The rubric judge is the only LLM call.
AI step: code a transcript excerpt
~$0.002
Schema validator
$0
Codebook enum check
$0
Substring verifier: does quote exist in source
$0
Rubric judge: theme-to-codebook fidelity
~$0.003
Pass threshold 0.85, else route to human review
Total cost per item: approximately $0.005. Five of the six gates are deterministic and zero-cost. The single LLM judge is the only expense, and it only runs if the deterministic gates pass.
Failure routing
When a gate fails, the work does not silently proceed. The failed output routes to human review with three things attached: what was attempted, which gate failed, and why.
Routing is per-gate and per-step. If a gate fails on item three, only that item from that step is flagged. Other items in the batch continue.