Pipeline Quality Assurance

Over twenty quality control mechanisms, applied between every step.

This page is about quality control mechanisms (QCMs) built into our pipelines. For task-level QA practices when you are using AI on your own work, see the AI Quality Assurance guide.

Our QCM library

We pull from 20+ documented QCMs across five categories. The right mix depends on the pipeline, the stakes, and the cost tolerance.

Deterministic validation

Zero cost

Rule-based checks that run in milliseconds. No AI call. Catch the most common failures cheaply.

Schema validator. Output has the right fields, types, and structure.
Format validator. Output matches expected shape, sections, heading hierarchy.
Range validator. Numeric fields fall within acceptable bounds.
Uniqueness check. No duplicate IDs or collisions in enumerated fields.
Substring verifier. Extracted quotes actually exist in the source material.
Identifier scanner. Flags personally identifiable information before it leaves local infrastructure.

Semantic judging

Per-call LLM cost

An independent AI model reviews output against a rubric or against source material.

Rubric-scored judge. AI evaluates output against specific quality criteria, returns a 0-1 score.
Hallucination detection. Claims in output are cross-checked against source material.
Evidence check. Every cited claim actually traces back to grounding evidence.
Gap detection. Flags content that should be present but is missing.
Consistency check. Same concept or code is applied the same way across items.
Tone check. Output stays on-brand and voice-consistent.

Variant selection

Multi-call LLM cost

Generate multiple candidates, a judge picks the best. Worth the cost when variant quality varies.

Variant tournament. Generate 3-5 variants of a step, judge scores each, best wins.
Consensus filter. Three independent judges vote; majority decides pass/fail.
Comparative A/B. Compare revised output against the original, keep the better one.

Iterative loops

Bounded retries

When a gate fails, auto-fix and retry. Capped at two or three iterations to prevent runaway cost.

Self-repair cycle. Failed output auto-fixes based on the specific gate that failed, then retries.
Critique and revise. One AI critiques, a second AI applies the fixes with the critique as context.
Quality loop. Iterative scoring and revision until the threshold is met or iteration cap is hit.

Context-aware routing

Varies

Apply the right intensity of QA to the right kind of work. Not every gate needs every check.

Sampling gate. For trusted high-volume pipelines, run QA on a statistical sample.
Adaptive threshold. Higher-stakes outputs require higher scores to pass.
Human-in-loop. Borderline scores (0.75-0.84) route to a human review queue; auto-block below that.

An example gate stack

A single AI step in a qualitative coding pipeline, with the quality gates that run after it. Most gates here are zero-cost deterministic checks. The rubric judge is the only LLM call.

AI step: code a transcript excerpt

~$0.002

Schema validator

Codebook enum check

Substring verifier: does quote exist in source

Rubric judge: theme-to-codebook fidelity

~$0.003

LLM

Pass threshold 0.85, else route to human review

Gate

Total cost per item: approximately $0.005. Five of the six gates are deterministic and zero-cost. The single LLM judge is the only expense, and it only runs if the deterministic gates pass.

Failure routing

When a gate fails, the work does not silently proceed. The failed output routes to human review with three things attached: what was attempted, which gate failed, and why.

Routing is per-gate and per-step. If a gate fails on item three, only that item from that step is flagged. Other items in the batch continue.

Data privacy

How sensitive data is handled at every step.

How we build

The full architectural approach.

Discuss a Pilot

We will scope quality gates appropriate to your data and stakes.