Skip to main content
M&E Studio
Home
Services
Tools
AI for M&E
Workflows
Plugins
Prompts
Start a Conversation
Library
Contribution AnalysisDevelopmental EvaluationImpact EvaluationLogframe / Logical FrameworkMost Significant ChangeOutcome HarvestingOutcome MappingParticipatory EvaluationProcess TracingQuasi-Experimental DesignRealist EvaluationResults FrameworkResults-Based ManagementTheory of ChangeUtilization-Focused Evaluation
M&E Studio

Decision-Grade M&E, Responsibly Built

About

  • About Us
  • Contact
  • LinkedIn

Services

  • Our Services
  • Tools

AI for M&E

  • Workflows
  • Plugins
  • Prompts
  • AI Course

M&E Library

  • Decision Guides
  • Indicators
  • Reference
  • Downloads

Legal

  • Terms
  • Privacy
  • Accessibility

© 2026 Logic Lab LLC. All rights reserved.

  1. M&E Library
  2. /
  3. Impact Evaluation
PillarMethods8 min read

Impact Evaluation

A rigorous evaluation approach that measures the causal effect of a programme on outcomes by comparing what happened with what would have happened in its absence.

When to Use

Impact evaluation is the right approach when you need to know whether a programme caused observed changes in outcomes, not just whether outcomes improved, but whether the improvement was due to the programme. This is a high bar that requires substantial investment in design and data collection. Use it when:

  • Scale decisions depend on evidence: governments or donors considering large-scale rollout need credible evidence that the programme works before committing resources
  • Programme effectiveness is genuinely uncertain: the intervention has a plausible theory of change but has not been rigorously tested in this context
  • Policy competition exists: comparing two alternative approaches requires a comparative design to determine which is more effective
  • Donor requirements mandate it: USAID, USDA, and the World Bank increasingly require impact evaluations for programmes above certain thresholds, particularly for food security, health, and agriculture
  • The stakes are high: programmes that affect large numbers of people or involve significant resources warrant the investment in rigorous evaluation

Impact evaluation is not appropriate when the programme is still being developed (use formative evaluation first), when outcomes cannot be measured in the programme timeline, when a counterfactual cannot be ethically or practically constructed, or when the evaluation question is about how outcomes occurred rather than whether they did (use contribution analysis or process tracing instead).

ScenarioUse Impact Evaluation?Better Alternative
Scaling decision for proven modelYes—
Early-stage programme developmentNoFormative evaluation
Complex multi-actor changeNoContribution Analysis
How and why change happenedNoProcess Tracing
No counterfactual possibleNoContribution Analysis
Donor mandates attribution evidenceYes—

How It Works

All impact evaluations rest on one central idea: the counterfactual: what would have happened to programme participants in the absence of the programme. Since you cannot observe the same people both with and without the programme, you construct a comparison group that approximates this counterfactual.

Step 1: Plan at the design stage

Impact evaluations must be planned before the programme starts. Retrospective impact evaluation is rarely credible. Baseline data must be collected before the programme begins.

Step 2: Define the evaluation question

State precisely what outcome you are trying to measure, for whom, over what time period, and at what geographic level. Vague questions produce inconclusive evaluations.

Step 3: Choose a design

The design choice depends on whether random assignment is feasible:

  • Randomised Controlled Trial (RCT): participants are randomly assigned to treatment or control. Gold standard for internal validity but costly and often ethically difficult
  • Quasi-experimental designs: when randomisation is not possible: difference-in-differences, propensity score matching, regression discontinuity, or interrupted time series. See quasi-experimental design for details

Step 4: Establish baseline

Collect data on outcomes for both treatment and comparison groups before the programme begins. This is non-negotiable. The two groups must be comparable at baseline, any differences should be documented and controlled for in analysis.

Step 5: Implement with evaluation integrity

Monitor for contamination (comparison group accessing programme), attrition (losing study participants), and design fidelity (programme delivered as intended). These threats to validity must be managed throughout implementation.

Step 6: Collect follow-up data and analyse

Collect midline and endline data at pre-specified intervals. Analyse using the appropriate statistical methods for the chosen design. Report the treatment effect size with confidence intervals, not just significance tests.

Step 7: Interpret and communicate findings

A statistically significant effect is not the same as a practically meaningful one. Report effect sizes in terms decision-makers understand (absolute changes, percentage changes, lives affected) alongside statistical significance.

Key Components

  • Counterfactual: a credible comparison group that approximates what would have happened without the programme
  • Baseline data: pre-intervention outcome measurements for both groups
  • Primary outcome indicator: one or two key outcomes the evaluation is powered to detect
  • Sample size calculation: determines how many participants are needed to detect an effect of expected magnitude
  • Pre-registration: registering the evaluation design, hypotheses, and analysis plan before data collection (increasingly required by 3ie, J-PAL, and major donors)
  • Follow-up data: midline and endline measurements at pre-specified intervals
  • Analysis plan: pre-specified statistical methods to prevent data dredging

Best Practices

Commit to the counterfactual. The entire credibility of an impact evaluation depends on the quality of the comparison group. Random assignment is the gold standard; when it is not feasible, document carefully why and use the best available quasi-experimental design.

Mandate baseline data collection. No baseline means no impact evaluation, only a before-after comparison, which cannot rule out trends that would have occurred anyway.

Power the study to detect realistic effects. Underpowered studies produce inconclusive results regardless of how well everything else is done. Work with a statistician to calculate minimum sample sizes based on expected effect sizes.

Use the same instruments across groups. Survey tools and questions must be identical between treatment and comparison groups to ensure comparability.

Pre-register the design. Pre-registration prevents selective reporting of positive findings and builds credibility with donors and policymakers. 3ie, AEA RCT Registry, and RIDIE are the main registries.

Common Mistakes

Starting too late. Impact evaluations designed after implementation begins cannot establish valid baselines. The most common and most costly mistake in impact evaluation is failure to plan prospectively.

Asking the impact evaluation to answer process questions. An impact evaluation tells you whether outcomes changed. It will not tell you why, for whom the effect varied, or what mechanisms produced it. Pair it with qualitative methods for process insights.

Inadequate attention to comparison group quality. Propensity score matching, difference-in-differences, and regression discontinuity all depend on assumptions that must be tested and reported. Presenting quasi-experimental results without discussing the plausibility of design assumptions is misleading.

Conflating statistical significance with programme success. A statistically significant effect of negligible magnitude is not a programme success. Report and interpret effect sizes.

Neglecting negative results. Null results are information. A well-conducted impact evaluation that finds no effect is valuable evidence. Suppress null results and you distort the evidence base.

Examples

Agricultural livelihoods, East Africa. A USDA-funded food security programme in Ethiopia used a quasi-experimental design with propensity score matching to evaluate impact on household dietary diversity and income. Baseline data was collected for 3,000 treatment households and 2,400 matched comparison households before programme start. Midline and endline surveys tracked outcomes over five years. The evaluation found a 0.8 standard deviation improvement in dietary diversity scores in treatment households relative to comparison, attributed to the programme. The effect was concentrated in female-headed households, prompting a design revision for the follow-on programme.

Health, West Africa. A USAID-funded malaria prevention programme in Nigeria used a cluster-randomised trial design, randomising 60 communities to treatment (free bednet distribution plus community health worker visits) or control (free bednets only). The evaluation found that adding community health worker visits produced a 23 percentage point increase in consistent bednet use relative to bednets alone, justifying the additional cost of the community health worker component in national scale-up planning.

Education, South Asia. A World Bank-supported learning improvement programme in Pakistan used a regression discontinuity design based on school-level test score rankings to evaluate impact on student achievement. Schools just below the eligibility threshold were compared to schools just above. The evaluation found a 0.4 standard deviation improvement in literacy scores among Grade 3 students in programme schools, with larger effects for girls and rural schools.

Compared To

ApproachCausal ClaimCounterfactualSuitable When
Impact EvaluationAttributable effectExplicitFeasible counterfactual, scale decision
Quasi-Experimental DesignAttributable effectConstructedRandomisation not feasible
Contribution AnalysisPlausible contributionNoneComplex, multi-actor change
Process TracingCausal mechanismNoneUnderstanding how change happened
Realist EvaluationContextual mechanismsPartialWhat works, for whom

Relevant Indicators

52 donor-aligned indicators across USAID, DFID, World Bank, 3ie, USDA, and Global Fund. Key examples:

  • Net attributable change in primary outcome between baseline and endline (treatment vs. comparison)
  • Effect size (Cohen's d or percentage point difference) at programme completion
  • Proportion of evaluation hypotheses confirmed versus disconfirmed
  • Fidelity score for programme implementation as designed

Related Tools

  • Evaluation Planner: structure your evaluation design and timeline from programme start
  • Indicator Library: find donor-aligned outcome indicators for your sector

Related Topics

  • Quasi-Experimental Design, the most common alternative when RCTs are not feasible
  • Contribution Analysis, for when a counterfactual cannot be constructed
  • Baseline Design, the foundational data collection without which no impact evaluation is possible
  • Attribution vs. Contribution, understanding the distinction between impact evaluation and contribution claims
  • Mixed Methods Evaluation, pairing quantitative impact estimates with qualitative process insights

Further Reading

  • Gertler, P., Martinez, S., Premand, P., Rawlings, L., & Vermeersch, C. (2016). Impact Evaluation in Practice. 2nd ed. World Bank. The most accessible practitioner guide.
  • White, H. (2014). Current Challenges in Impact Evaluation. 3ie Working Paper 18. Reviews methodological debates.
  • J-PAL (2019). Introduction to Evaluations. Poverty Action Lab. Free online course covering RCT design.
  • USAID (2016). Evaluation: Learning from Experience. ADS 203. USAID's policy on evaluation including impact evaluation requirements.

At a Glance

Determines whether and to what degree a programme caused observed changes in outcomes, using a counterfactual to isolate programme effects.

Best For

  • Answering whether a programme works before scaling
  • Justifying significant budget commitments to donors or governments
  • Informing policy decisions that depend on evidence of effectiveness
  • Comparing alternative programme models to find the most effective approach

Complexity

Very High

Timeframe

Planned at design phase; data collection over full programme lifecycle (typically 3-7 years)

Linked Indicators

52 indicators across 6 donor frameworks

USAIDDFIDWorld Bank3ieUSDAGlobal Fund

Examples

  • Attributable change in primary outcome indicator between baseline and endline
  • Size of treatment effect (effect size) at programme completion
  • Difference-in-differences estimate between treatment and comparison group

Related Topics

Pillar
Quasi-Experimental Design
A family of evaluation designs that estimate causal programme effects without random assignment, using statistical methods to construct credible comparison groups.
Pillar
Contribution Analysis
A structured approach to building a credible case for how and why a programme contributed to observed outcomes, without requiring experimental attribution.
Pillar
Theory of Change
A structured explanation of how and why a set of activities is expected to lead to desired outcomes, mapping the causal logic from inputs to impact.
Core Concept
Baseline Design
A structured approach to collecting initial condition data that directly informs project decisions, minimizes burden, and enables valid comparison with endline measurements.
Core Concept
Sampling Methods
Systematic approaches for selecting a subset of a population to represent the whole, balancing statistical validity with practical constraints.
Core Concept
Mixed Methods Evaluation
An evaluation approach that systematically combines quantitative and qualitative data to provide a more complete understanding of programme effects, mechanisms, and context.
Core Concept
Evaluation Criteria (DAC)
The OECD-DAC framework provides five standard criteria, relevance, efficiency, effectiveness, impact, and sustainability, for systematically assessing the merit and value of development interventions.
Term
Attribution vs Contribution
The distinction between proving a programme directly caused outcomes (attribution) versus building a credible case that it contributed to outcomes alongside other factors (contribution).