How to Design an Evaluation

An evaluation is a systematic assessment of a program's design, implementation, or results. Unlike routine monitoring, which tracks whether activities are happening as planned, an evaluation asks deeper questions: Are outcomes being achieved? Why or why not? What would have happened without the program?

What is an evaluation?

Evaluations are one of the highest-cost M&E activities, and also one of the most frequently misused. A common failure pattern: an evaluation is commissioned to satisfy a reporting requirement, findings are produced too late to influence decisions, and the report sits unread. Good evaluation design starts with use: who will act on the findings, and how.

Types of evaluation

The four most common types in development programming serve different purposes and require different resources:

Type	When to use it
Formative evaluations	happen during implementation and focus on improvement. They ask: what is working, what needs to change, and how can we do it better?
Summative evaluations	happen at or after completion and focus on judgment. They ask: did the program achieve its objectives, and was it worth the investment?
Process evaluations	focus on how activities were implemented. Useful when you need to understand delivery quality before drawing conclusions about outcomes.
Impact evaluations	attempt to attribute observed change to the program by comparing outcomes with a counterfactual (what would have happened without the intervention). They require the most rigorous design and the highest budget.

When should you evaluate?

Most programs with donor funding are required to conduct at least one mid-term review and one end-of-project evaluation. Beyond compliance, the right time to evaluate is when a decision needs evidence: a scale-up decision, a course correction, a funding renewal application, or a learning agenda question that monitoring data cannot answer.

Evaluations that are not tied to a decision tend to produce reports, not change. Before commissioning an evaluation, the most important question is: who will use the findings, and what will they decide differently because of them?

Evaluation purpose

Why are you evaluating?

Formative. Improve implementation during a program
Summative. Judge effectiveness at or after completion
Process. Assess how activities were implemented
Impact. Attribute change to the program (needs counterfactual)

Design type

How rigorous does the design need to be?

Experimental (RCT). Random assignment: highest rigor, highest cost
Quasi-experimental. Comparison group without randomization
Pre-post with theory. Before/after with contribution analysis
Qualitative. Process tracing, case study, most significant change

Data methods

How will you collect evidence?

Surveys. Structured data at scale, quantitative or mixed
Key informant interviews. Depth on process, barriers, and context
Focus group discussions. Group perspectives and shared experiences
Document review. Program records, monitoring data, secondary sources

Sampling approach

Who will you collect data from?

Probability sampling. Random or systematic, needed for statistical inference
Purposive sampling. Deliberate selection for qualitative depth
Stratified. Separate strata (sex, region) to ensure representation
Mixed. Quantitative probability + qualitative purposive

What Goes in an Evaluation TOR

A terms of reference (TOR) is the document that defines an evaluation's scope, questions, and requirements before the evaluator is hired. A weak TOR produces a weak evaluation.

Evaluation questions

Three to five prioritized questions that define what the evaluation will answer. The single most important element: everything else flows from here.

Scope and boundaries

Time period covered, geography, target population, and what is explicitly out of scope. Prevents scope creep and focuses the budget.

Methodology overview

The design type, data collection methods, and analytical approach. Should be matched to the evaluation questions, not imported from a previous TOR.

Budget and timeline

Realistic estimates for all evaluation activities: fieldwork, data entry, analysis, reporting, and review cycles. Budget should reflect actual scope.

Independence and ethics

Evaluator independence requirements, conflict of interest policy, data protection provisions, and ethical review process.

Deliverables and use

What outputs are expected (inception report, draft, final), who reviews them, and, critically, how findings will be used after the evaluation.

TOR quality checklist

Evaluation questions clearly stated and prioritized
Methodology matched to questions (not the other way around)
Evaluation use plan developed before data collection
Terms of reference reviewed by key stakeholders
Budget and timeline are realistic for the scope
Evaluator independence requirements specified
Data protection and ethical review process defined

Explore next

Drill deeper, pick between methods, or use a prompt to draft with AI.

Prompts

Decision guides

Related concepts

Draft this with AI All M&E Guides