EvaluationGuide

How to Design an Evaluation

An evaluation is a systematic assessment of a program's design, implementation, or results. Unlike routine monitoring, which tracks whether activities are happening as planned, an evaluation asks deeper questions: Are outcomes being achieved? Why or why not? What would have happened without the program?

What is an evaluation?

Evaluations are one of the highest-cost M&E activities, and also one of the most frequently misused. A common failure pattern: an evaluation is commissioned to satisfy a reporting requirement, findings are produced too late to influence decisions, and the report sits unread. Good evaluation design starts with use: who will act on the findings, and how.

Types of evaluation

The four most common types in development programming serve different purposes and require different resources:

TypeWhen to use it
Formative evaluationshappen during implementation and focus on improvement. They ask: what is working, what needs to change, and how can we do it better?
Summative evaluationshappen at or after completion and focus on judgment. They ask: did the program achieve its objectives, and was it worth the investment?
Process evaluationsfocus on how activities were implemented. Useful when you need to understand delivery quality before drawing conclusions about outcomes.
Impact evaluationsattempt to attribute observed change to the program by comparing outcomes with a counterfactual (what would have happened without the intervention). They require the most rigorous design and the highest budget.

When should you evaluate?

Most programs with donor funding are required to conduct at least one mid-term review and one end-of-project evaluation. Beyond compliance, the right time to evaluate is when a decision needs evidence: a scale-up decision, a course correction, a funding renewal application, or a learning agenda question that monitoring data cannot answer.

Evaluations that are not tied to a decision tend to produce reports, not change. Before commissioning an evaluation, the most important question is: who will use the findings, and what will they decide differently because of them?

Evaluation purpose

Why are you evaluating?

  • Formative. Improve implementation during a program
  • Summative. Judge effectiveness at or after completion
  • Process. Assess how activities were implemented
  • Impact. Attribute change to the program (needs counterfactual)

Design type

How rigorous does the design need to be?

  • Experimental (RCT). Random assignment: highest rigor, highest cost
  • Quasi-experimental. Comparison group without randomization
  • Pre-post with theory. Before/after with contribution analysis
  • Qualitative. Process tracing, case study, most significant change

Data methods

How will you collect evidence?

  • Surveys. Structured data at scale, quantitative or mixed
  • Key informant interviews. Depth on process, barriers, and context
  • Focus group discussions. Group perspectives and shared experiences
  • Document review. Program records, monitoring data, secondary sources

Sampling approach

Who will you collect data from?

  • Probability sampling. Random or systematic, needed for statistical inference
  • Purposive sampling. Deliberate selection for qualitative depth
  • Stratified. Separate strata (sex, region) to ensure representation
  • Mixed. Quantitative probability + qualitative purposive

What Goes in an Evaluation TOR

A terms of reference (TOR) is the document that defines an evaluation's scope, questions, and requirements before the evaluator is hired. A weak TOR produces a weak evaluation.

Evaluation questions

Three to five prioritized questions that define what the evaluation will answer. The single most important element: everything else flows from here.

Scope and boundaries

Time period covered, geography, target population, and what is explicitly out of scope. Prevents scope creep and focuses the budget.

Methodology overview

The design type, data collection methods, and analytical approach. Should be matched to the evaluation questions, not imported from a previous TOR.

Budget and timeline

Realistic estimates for all evaluation activities: fieldwork, data entry, analysis, reporting, and review cycles. Budget should reflect actual scope.

Independence and ethics

Evaluator independence requirements, conflict of interest policy, data protection provisions, and ethical review process.

Deliverables and use

What outputs are expected (inception report, draft, final), who reviews them, and, critically, how findings will be used after the evaluation.

TOR quality checklist

  • Evaluation questions clearly stated and prioritized
  • Methodology matched to questions (not the other way around)
  • Evaluation use plan developed before data collection
  • Terms of reference reviewed by key stakeholders
  • Budget and timeline are realistic for the scope
  • Evaluator independence requirements specified
  • Data protection and ethical review process defined

Explore next

Drill deeper, pick between methods, or use a prompt to draft with AI.