The 5-Step Baseline-Endline Workflow

Move from raw survey data to a defensible comparison narrative. Each step builds on the previous one to ensure the analysis answers your actual evaluation questions.

1

Align Data Structures

Before any analysis, ask AI to compare your baseline and endline datasets: are the variable names consistent? Are the response options identical? Are the sampling units comparable? Misalignment here invalidates everything that follows.

2

Set Up the Comparison Matrix

Create a table mapping each evaluation question to its indicator, the baseline value, the endline value, and the expected direction of change. Ask AI to populate this from your data. This becomes the backbone of your analysis.

3

Run Statistical Comparisons

For each indicator, ask AI to recommend and run the appropriate statistical test: chi-square for proportions, paired t-test for means, McNemar for matched binary data. Always ask for confidence intervals, not just p-values.

4

Investigate Anomalies

Ask AI to flag indicators where the change is unexpected: improvements where you expected decline, no change despite heavy investment, or changes that are statistically significant but practically meaningless. These are the findings worth discussing.

5

Draft the Comparison Narrative

Feed the comparison matrix and statistical results to AI and ask it to draft a findings section organized by evaluation question. Each finding should state: what changed, by how much, whether the change is statistically significant, and what it means for the program.

Weak vs. Strong Baseline-Endline Analysis

The gap between useful analysis and data theater comes down to whether the analysis answers evaluation questions or just describes numbers.

Data Preparation

Vague prompt

You paste the endline dataset into ChatGPT and say "Analyze this data." The AI produces summary statistics for every variable, most of which are irrelevant. You have no comparison baseline and no framework for interpretation.

Data Preparation

4Cs prompt

You provide both datasets with a mapping table: "Here is my baseline data (n=400) and endline data (n=385). Compare these 12 indicators. Variable X in baseline corresponds to variable X_endline. Flag any variables where coding has changed between rounds."

Statistical Testing

Vague prompt

The AI runs t-tests on everything and reports p-values. You report "statistically significant improvement" for indicators where p < 0.05, ignoring effect size, sample composition changes, and multiple comparison issues.

Statistical Testing

4Cs prompt

The AI recommends the appropriate test per indicator type (proportions vs. means vs. ordinal), reports both the p-value and the effect size, adjusts for multiple comparisons where relevant, and notes where the sample composition changed between rounds.

Narrative

Vague prompt

"Indicator X increased from 34% to 42%." No context on whether this is meaningful, no discussion of why, no connection to program activities, no caveats about data quality.

Narrative

4Cs prompt

"Knowledge of proper handwashing technique increased from 34% (n=400) to 42% (n=385), a statistically significant difference (chi-square = 5.2, p = 0.02). This change coincides with the hygiene promotion campaign in Q2-Q3. However, the endline sample had higher female representation (62% vs. 54%), which may partially explain the increase given that women scored higher at baseline."

5 Rules for AI-Assisted Baseline-Endline Analysis

Always check data alignment before analysis

Variable names change between survey rounds. Response options get reworded. New categories get added. Ask AI to produce a variable mapping table before running any comparison. Ten minutes of alignment checking saves days of wrong conclusions.

Report effect sizes, not just significance

A statistically significant change in a large sample can be practically meaningless. A 2-percentage-point increase with 2,000 respondents will be "significant" but may not justify the investment. Always ask AI to calculate and report effect sizes alongside p-values.

Account for sample composition changes

If your baseline sample was 50% female and your endline is 65% female, any gender-correlated indicator will show change that has nothing to do with your program. Ask AI to check demographic comparability and flag discrepancies.

Connect findings to evaluation questions

Every finding should answer a question someone actually asked. Structure your analysis around 4-6 evaluation questions, not around 30 indicators. Ask AI to organize the narrative by question, not by variable number.

Never paste raw beneficiary data into cloud AI

Baseline and endline datasets often contain personally identifiable information. Remove names, GPS coordinates, phone numbers, and any combination of village + age + sex that could identify individuals before sharing data with any AI tool.

Baseline-Endline Analysis Prompt

Use this prompt after cleaning and aligning your datasets. Paste the summary statistics or a representative sample, not the full raw dataset.

AI-Assisted Baseline-Endline Comparison

I need to compare baseline and endline survey data for a program evaluation.

Evaluation questions:
1. [e.g., Did knowledge of proper handwashing improve among target households?]
2. [e.g., Did the proportion of households using improved water sources increase?]
3. [e.g., Did dietary diversity among children under 5 improve?]

Data summary:
- Baseline: [date, n=X, sampling method]
- Endline: [date, n=X, sampling method]
- Key demographic comparison: [e.g., baseline 54% female, endline 62% female]

Indicator data (paste as table):
| Indicator | Baseline Value | Endline Value | Indicator Type |
|-----------|---------------|---------------|----------------|
| [e.g., % HH with handwashing knowledge] | [34%] | [42%] | [proportion] |
| [add rows] | | | |

For each indicator, please:
1. Recommend the appropriate statistical test
2. Calculate the test statistic, p-value, and effect size
3. Flag if the sample composition difference could bias the result
4. Rate the finding: strong evidence / moderate evidence / weak evidence / no evidence of change
5. Draft a 2-3 sentence narrative finding

Then provide an overall summary organized by evaluation question.

How to Use AI for Baseline and Endline Analysis

The 5-Step Baseline-Endline Workflow

Align Data Structures

Set Up the Comparison Matrix

Run Statistical Comparisons

Investigate Anomalies

Draft the Comparison Narrative

Weak vs. Strong Baseline-Endline Analysis

5 Rules for AI-Assisted Baseline-Endline Analysis

Always check data alignment before analysis

Report effect sizes, not just significance

Account for sample composition changes

Connect findings to evaluation questions

Never paste raw beneficiary data into cloud AI

Baseline-Endline Analysis Prompt

Analyze Your Data

How to Use AI for Baseline and Endline Analysis

The 5-Step Baseline-Endline Workflow

Align Data Structures

Set Up the Comparison Matrix

Run Statistical Comparisons

Investigate Anomalies

Draft the Comparison Narrative

Weak vs. Strong Baseline-Endline Analysis

5 Rules for AI-Assisted Baseline-Endline Analysis

Always check data alignment before analysis

Report effect sizes, not just significance

Account for sample composition changes

Connect findings to evaluation questions

Never paste raw beneficiary data into cloud AI

Baseline-Endline Analysis Prompt

Related Resources

How to Use AI for Indicator Development

How to Clean Messy M&E Data with AI

How to Use AI for Donor Reporting

Baseline

Endline

Compare Baseline vs Endline Results

Analyze Your Data