How to Design an Evaluation
An evaluation is a systematic assessment of a program's design, implementation, or results. Unlike routine monitoring, which tracks whether activities are happening as planned, an evaluation asks deeper questions: Are outcomes being achieved? Why or why not? What would have happened without the program?
What is an evaluation?
Evaluations are one of the highest-cost M&E activities, and also one of the most frequently misused. A common failure pattern: an evaluation is commissioned to satisfy a reporting requirement, findings are produced too late to influence decisions, and the report sits unread. Good evaluation design starts with use: who will act on the findings, and how.
Types of evaluation
The four most common types in development programming serve different purposes and require different resources:
| Type | When to use it |
|---|---|
| Formative evaluations | happen during implementation and focus on improvement. They ask: what is working, what needs to change, and how can we do it better? |
| Summative evaluations | happen at or after completion and focus on judgment. They ask: did the program achieve its objectives, and was it worth the investment? |
| Process evaluations | focus on how activities were implemented. Useful when you need to understand delivery quality before drawing conclusions about outcomes. |
| Impact evaluations | attempt to attribute observed change to the program by comparing outcomes with a counterfactual (what would have happened without the intervention). They require the most rigorous design and the highest budget. |
When should you evaluate?
Most programs with donor funding are required to conduct at least one mid-term review and one end-of-project evaluation. Beyond compliance, the right time to evaluate is when a decision needs evidence: a scale-up decision, a course correction, a funding renewal application, or a learning agenda question that monitoring data cannot answer.
Evaluations that are not tied to a decision tend to produce reports, not change. Before commissioning an evaluation, the most important question is: who will use the findings, and what will they decide differently because of them?
Evaluation purpose
Why are you evaluating?
- Formative. Improve implementation during a program
- Summative. Judge effectiveness at or after completion
- Process. Assess how activities were implemented
- Impact. Attribute change to the program (needs counterfactual)
Design type
How rigorous does the design need to be?
- Experimental (RCT). Random assignment: highest rigor, highest cost
- Quasi-experimental. Comparison group without randomization
- Pre-post with theory. Before/after with contribution analysis
- Qualitative. Process tracing, case study, most significant change
Data methods
How will you collect evidence?
- Surveys. Structured data at scale, quantitative or mixed
- Key informant interviews. Depth on process, barriers, and context
- Focus group discussions. Group perspectives and shared experiences
- Document review. Program records, monitoring data, secondary sources
Sampling approach
Who will you collect data from?
- Probability sampling. Random or systematic, needed for statistical inference
- Purposive sampling. Deliberate selection for qualitative depth
- Stratified. Separate strata (sex, region) to ensure representation
- Mixed. Quantitative probability + qualitative purposive
What Goes in an Evaluation TOR
A terms of reference (TOR) is the document that defines an evaluation's scope, questions, and requirements before the evaluator is hired. A weak TOR produces a weak evaluation.
Evaluation questions
Three to five prioritized questions that define what the evaluation will answer. The single most important element: everything else flows from here.
Scope and boundaries
Time period covered, geography, target population, and what is explicitly out of scope. Prevents scope creep and focuses the budget.
Methodology overview
The design type, data collection methods, and analytical approach. Should be matched to the evaluation questions, not imported from a previous TOR.
Budget and timeline
Realistic estimates for all evaluation activities: fieldwork, data entry, analysis, reporting, and review cycles. Budget should reflect actual scope.
Independence and ethics
Evaluator independence requirements, conflict of interest policy, data protection provisions, and ethical review process.
Deliverables and use
What outputs are expected (inception report, draft, final), who reviews them, and, critically, how findings will be used after the evaluation.
TOR quality checklist
- Evaluation questions clearly stated and prioritized
- Methodology matched to questions (not the other way around)
- Evaluation use plan developed before data collection
- Terms of reference reviewed by key stakeholders
- Budget and timeline are realistic for the scope
- Evaluator independence requirements specified
- Data protection and ethical review process defined
Explore next
Drill deeper, pick between methods, or use a prompt to draft with AI.
Prompts
Draft an evaluation TOR
Turn your program brief into a publishable TOR: scope, questions, methodology.
Build an evaluation matrix
Link evaluation questions to criteria, indicators, data sources, and methods.
Review your evaluation design
Structured feedback on methodology, questions, sampling, and analysis plan before fieldwork.
Decision guides