Pipeline Quality Assurance

Over twenty quality control mechanisms, applied between every step.

This page is about quality control mechanisms (QCMs) built into our pipelines. For task-level QA practices when you are using AI on your own work, see the AI Quality Assurance guide.

Our QCM library

We pull from 20+ documented QCMs across five categories. The right mix depends on the pipeline, the stakes, and the cost tolerance.

Deterministic validation

Zero cost

Rule-based checks that run in milliseconds. No AI call. Catch the most common failures cheaply.

  • Schema validator. Output has the right fields, types, and structure.
  • Format validator. Output matches expected shape, sections, heading hierarchy.
  • Range validator. Numeric fields fall within acceptable bounds.
  • Uniqueness check. No duplicate IDs or collisions in enumerated fields.
  • Substring verifier. Extracted quotes actually exist in the source material.
  • Identifier scanner. Flags personally identifiable information before it leaves local infrastructure.

Semantic judging

Per-call LLM cost

An independent AI model reviews output against a rubric or against source material.

  • Rubric-scored judge. AI evaluates output against specific quality criteria, returns a 0-1 score.
  • Hallucination detection. Claims in output are cross-checked against source material.
  • Evidence check. Every cited claim actually traces back to grounding evidence.
  • Gap detection. Flags content that should be present but is missing.
  • Consistency check. Same concept or code is applied the same way across items.
  • Tone check. Output stays on-brand and voice-consistent.

Variant selection

Multi-call LLM cost

Generate multiple candidates, a judge picks the best. Worth the cost when variant quality varies.

  • Variant tournament. Generate 3-5 variants of a step, judge scores each, best wins.
  • Consensus filter. Three independent judges vote; majority decides pass/fail.
  • Comparative A/B. Compare revised output against the original, keep the better one.

Iterative loops

Bounded retries

When a gate fails, auto-fix and retry. Capped at two or three iterations to prevent runaway cost.

  • Self-repair cycle. Failed output auto-fixes based on the specific gate that failed, then retries.
  • Critique and revise. One AI critiques, a second AI applies the fixes with the critique as context.
  • Quality loop. Iterative scoring and revision until the threshold is met or iteration cap is hit.

Context-aware routing

Varies

Apply the right intensity of QA to the right kind of work. Not every gate needs every check.

  • Sampling gate. For trusted high-volume pipelines, run QA on a statistical sample.
  • Adaptive threshold. Higher-stakes outputs require higher scores to pass.
  • Human-in-loop. Borderline scores (0.75-0.84) route to a human review queue; auto-block below that.

An example gate stack

A single AI step in a qualitative coding pipeline, with the quality gates that run after it. Most gates here are zero-cost deterministic checks. The rubric judge is the only LLM call.

AI step: code a transcript excerpt

~$0.002

AI

Schema validator

$0

$0

Codebook enum check

$0

$0

Substring verifier: does quote exist in source

$0

$0

Rubric judge: theme-to-codebook fidelity

~$0.003

LLM

Pass threshold 0.85, else route to human review

Gate

Total cost per item: approximately $0.005. Five of the six gates are deterministic and zero-cost. The single LLM judge is the only expense, and it only runs if the deterministic gates pass.

Failure routing

When a gate fails, the work does not silently proceed. The failed output routes to human review with three things attached: what was attempted, which gate failed, and why.

Routing is per-gate and per-step. If a gate fails on item three, only that item from that step is flagged. Other items in the batch continue.

Discuss a Pilot

We will scope quality gates appropriate to your data and stakes.

Contact Us