How We Build Pipelines

Staged, gated, local-first.

Staged

Small steps. One simple job per AI call, so failures surface step by step instead of hiding in a wall of output.

Gated

Quality gates between steps. Schema checks, rule-based validators, and rubric-scored judges keep bad output from propagating.

Local-first

Local AI models for sensitive data. Deterministic de-identification when cloud models are needed. You decide where data goes.

Staged: how we compose pipelines

Four decisions we make on every pipeline: which steps are deterministic and which need an AI call, what temperature each step runs at, how long inputs get split, and which pattern the steps are arranged in.

Rule-based nodes where we can, AI nodes where we must

Rule-based nodes do work that does not require intelligence. They cost nothing to run and fail predictably. AI nodes are reserved for work that requires judgment or generation. The cheapest pipeline is the one that uses the fewest AI calls. Before adding an AI node, we ask whether a rule-based node can handle the work. Most of the time it can.

Rule-based (no AI)

Routing a record to the right downstream step based on a field value
Validating that a date parses and a numeric field is within bounds
Converting between JSON and CSV, or between donor templates
Checking that a required field is non-empty
De-identifying fields before data leaves your infrastructure

AI-required

Interpreting an open-ended survey response
Summarizing a long document against a specific question
Applying a codebook to an interview transcript
Writing a narrative paragraph from structured evidence
Scoring an output against a quality rubric

Temperature discipline

Temperature controls how much randomness a model introduces. Matching temperature to task is how we get reliable output at each stage. This is not stylistic. Cold temperatures reduce hallucination in extraction. Warm temperatures improve drafting quality.

Phase	Temperature	Why
Extraction and analysis	0.0 – 0.2	Same input, same output. Lowest hallucination risk.
Drafting	0.3 – 0.4	Language needs variation to not feel templated.
Polishing and judging	0.0 – 0.1	Consistency matters more than creativity.

Chunking for long inputs

Long documents do not go into one prompt. A 40-page evaluation report, a hundred-page donor manual, a thousand-row survey dataset: each gets split into chunks sized to the task, processed in parallel, then merged.

Three reasons. Context windows cost money: larger inputs cost more and run slower. Model attention degrades with length, so detail in the middle of a long prompt is more likely to be missed. Per-chunk parallelism can turn a run that would take an hour as one prompt into several minutes of parallel work.

How we chunk depends on the task: semantic boundaries for documents, row groups for tables, transcript turns for interviews.

Pipeline topology

Six patterns cover most pipeline topologies we build. Most production pipelines combine several: hierarchical overall, with iterative quality loops around creative stages, tournament selection where variant quality varies, and conditional routing where mixed inputs need different treatment.

Six pipeline patterns

Linear

Steps run one after another. Used when each step builds on the output of the previous one, like extract, then code, then summarize.

Parallel (fan-out / fan-in)

One input splits into concurrent branches that rejoin. Used when a document needs multiple independent analyses at once, like extracting themes and flagging risks in the same pass.

Hierarchical

A multi-section output built by running a sub-pipeline per section, then assembled. Used for long documents like evaluation reports with distinct chapters.

Iterative

A step runs, a judge scores it, and if quality is low the step re-runs with feedback. Capped at three loops. Used when outputs need iterative polish, like narrative drafting.

Tournament

Multiple variants of the same step run in parallel, a judge picks the best. Used for creative work where variant quality varies, like drafting a recommendation paragraph.

Conditional

A router sends each item down the right branch based on its characteristics. Used when the same pipeline handles mixed inputs, like routing transcripts one way and reports another.

A = input · B/C/D = processing steps · V = variant · J = judge · ◆ = router

Gated: quality assurance at every join

Every step passes through a quality gate before the next step runs. If the gate fails, the work routes to human review with the reason flagged. The pipeline does not silently proceed with a bad output.

Schema validators

Confirm the output has the right shape: required fields, correct types, valid formats. Deterministic, zero-cost, runs in milliseconds.

Rule-based checks

Enforce business logic: numeric ranges, enum values, cross-field constraints. Also deterministic, also zero-cost.

Rubric-scored judges

Use an AI model to score output against a rubric of specific criteria. Used when quality is something a human would evaluate subjectively.

Variant tournaments

Generate multiple candidates for the same step, then have a judge pick the best. Used when variant quality varies and the best one is worth the extra cost.

Pipelines set a passing threshold of 0.85 on rubric-scored outputs. Anything below routes to human review. Schema and rule-based checks are pass-fail.

See how each quality assurance method works

Local-first: sensitive data handling

Not every pipeline can run on local AI models, and not every pipeline needs to. The decision depends on what data is involved. We use a three-tier decision ladder.

Which data goes where

Identifiable + sensitive

Interview transcripts, health records, household rosters

Local models only

Sensitive but de-identifiable

Survey data with personal fields, beneficiary tracking

Deterministic anonymization, then cloud

Public or depersonalized

Reports, indicator data, donor guidance, operational metadata

Cloud models directly

Every pipeline logs which tier each step ran on, which model version was used, and what transformations were applied. You can audit where your data went and what processed it.

See the full data privacy approach

Where to next

Scope a custom pipeline

For recurring M&E data tasks tailored to your templates and requirements.

Request a Custom Pipeline

Task-level AI guidance

Playbooks and prompt templates for single-shot AI work you do yourself.

Visit AI for M&E

Discuss a Pilot

Tell us the M&E data task, the volume, and the sensitivity profile. We will scope a pilot that fits.