Who This Page Is For
You need to commission an evaluation. Maybe it is a final evaluation required by your donor agreement. Maybe it is a mid-term review your team decided to conduct. Either way, you need to write an evaluation TOR that attracts qualified evaluators and gives them enough direction to deliver useful findings.
A weak TOR produces a weak evaluation. If the scope is vague, the evaluator will define it for you, and you may not like their choices. If the evaluation questions are unfocused, you will get a 60-page report that answers nothing actionable. This guide walks you through each section of a TOR and tells you exactly what to include.
Before You Write: Answer the Use Question
Before drafting a single section, answer this question: Who will act on the findings, and how?
If you cannot answer that, you are not ready to write a TOR. You are commissioning a report that will sit on a shelf. Run the Evaluation Readiness assessment first.
Write down the specific decisions the evaluation will inform. "The country director will use findings to decide whether to scale the livelihood component to three additional districts" is a use statement. "We will learn about program effectiveness" is not.
This use statement shapes everything that follows: the evaluation questions, the methodology, the timeline, the audience for the report. Write it first. Put it in the TOR. Reference it when you review the evaluator's inception report.
The 8 Sections of an Evaluation TOR
1. Background and Context
Provide enough context for an external evaluator to understand the program without reading 200 pages of project documents.
Include:
- Program name, duration, geographic scope, and implementing partners
- Target population and approximate reach
- Total budget and funding source (without naming specific donors if confidentiality applies)
- A 2-3 sentence summary of the program's theory of change
- Key contextual factors the evaluator needs to understand (conflict, COVID disruption, policy changes)
Avoid: Pasting the entire project description from the proposal. The evaluator does not need five pages of background. They need one page that orients them, plus a list of key documents you will share during inception.
Write this: "The program operates in 12 districts across two regions, targeting 15,000 smallholder farming households. It aims to increase household food security through improved agricultural practices, market access, and nutrition behavior change."
Not this: "The project, which commenced in January 2023, is a multi-sectoral integrated development initiative that leverages synergies across agriculture, nutrition, and market systems to address the root causes of food insecurity among vulnerable populations in rural areas."
2. Evaluation Purpose and Use
State why this evaluation is happening and what decisions it will inform. This is where your pre-writing use statement goes.
Include:
- The primary purpose: accountability (proving results), learning (improving the program), or both
- The specific decisions the evaluation will inform
- The primary audience: program team, senior leadership, donor, government partners, beneficiaries
- How findings will be disseminated and used
The test: If you remove this section and the evaluation still makes sense, the section is too generic. It should be impossible to swap this purpose statement into a different program's TOR without rewriting it.
3. Scope and Evaluation Questions
This is the hardest section and the most important one. Get the evaluation questions wrong and nothing else matters.
Scope defines boundaries. Specify:
- Time period covered (full program, last two years, specific phase)
- Geographic coverage (all sites or a sample)
- Thematic coverage (all components or specific ones)
- Evaluation criteria to be addressed (relevance, coherence, effectiveness, efficiency, impact, sustainability)
Evaluation questions define what the evaluator must answer. Rules:
- 3-8 questions maximum. Each question costs money to answer well. Eight questions with a $40,000 budget means $5,000 per question. That is barely enough for one method per question. Prioritize ruthlessly.
- Each question must be answerable. If you ask "What was the program's impact on household food security?" but your budget cannot support a comparison group or counterfactual analysis, the evaluator cannot answer that question credibly. Match questions to methodology and budget.
- Each question must be actionable. Ask yourself: if the evaluator answers this question, will someone do something differently? If not, cut it.
- Organize questions under evaluation criteria. This helps evaluators structure their matrix and helps you confirm coverage.
Write this: "To what extent did the agricultural training component lead to adoption of improved farming practices among targeted households, and what factors enabled or hindered adoption?"
Not this: "What was the impact of the program?"
Use the Evaluation Designer to test whether your questions, methodology, and budget are aligned before finalizing the TOR.
4. Methodology Expectations
Set expectations without designing the evaluation yourself. The evaluator's job is to propose a detailed methodology in the inception report. Your job is to define the boundaries.
Include:
- General approach expected: mixed methods, theory-based, quasi-experimental. See How to Choose Evaluation Methodology.
- Non-negotiable requirements: comparison group, minimum sample size, specific disaggregation (gender, age, disability status), participatory methods
- Data sources available: existing monitoring data, baseline reports, program databases
- Expected data collection methods (surveys, key informant interviews, focus groups, document review)
- Ethical requirements: informed consent, data protection, IRB approval if needed
Avoid: Specifying the exact sample size, sampling strategy, and analytical framework. That is the evaluator's expertise. If you prescribe every detail, you are not hiring an evaluator. You are hiring a data collection firm.
The balance: "We expect a mixed-methods approach combining a household survey (minimum 400 households, disaggregated by gender of household head) with qualitative interviews and focus groups across at least 6 of the 12 program districts." That gives enough direction without designing the study.
5. Deliverables and Timeline
Be specific. Vague timelines produce delayed evaluations.
Standard deliverables:
| Deliverable | Typical Timeline |
|---|---|
| Inception report (detailed methodology, workplan, tools) | 2-3 weeks after contract signing |
| Draft data collection tools | With inception report |
| Fieldwork completion | 3-6 weeks after inception approval |
| Draft evaluation report | 2-3 weeks after fieldwork |
| Presentation of preliminary findings | With or before draft report |
| Final evaluation report | 2 weeks after receiving comments |
| Clean datasets and codebooks | With final report |
| Management response template | With final report |
Include: Total calendar duration from contract to final report. Build in time for your team to review and comment on the inception report and draft report. Those review periods often take longer than the evaluator's work.
Avoid: Setting a timeline that makes quality impossible. A final evaluation with primary data collection in under 8 weeks total is unrealistic. If the donor deadline forces a compressed timeline, say so explicitly and adjust scope accordingly.
6. Evaluator Qualifications
Specify what you need. Be concrete.
Required qualifications (typical):
- Advanced degree in a relevant field
- Minimum years of evaluation experience (7-10 for team lead)
- Experience with the specific methodology you expect
- Sector expertise (food security, health, education, governance)
- Regional or country experience
- Language requirements (for the team, not necessarily the lead)
Selection criteria and weighting:
| Criterion | Suggested Weight |
|---|---|
| Technical approach and methodology | 30-40% |
| Team qualifications and experience | 25-30% |
| Understanding of context and evaluation questions | 15-20% |
| Budget and value for money | 15-20% |
Avoid: Requiring 15 years of experience in the exact sub-sector in the exact country. You will eliminate strong candidates and end up with one applicant. Require core skills and weight relevant experience in the scoring.
7. Budget
You have two options: state a fixed budget ceiling or ask evaluators to propose their own. Each has tradeoffs.
Fixed ceiling: Evaluators design within your constraints. You get comparable proposals. Risk: evaluators cut corners to fit the budget.
Open budget: You see what a quality evaluation actually costs. Risk: proposals range from $20,000 to $200,000 and you cannot compare them.
Recommended approach: State a budget range. "The available budget for this evaluation is $40,000-55,000, inclusive of all costs." This anchors proposals and still allows variation.
Include what the budget must cover: evaluator fees, travel, data collection (enumerators, translations, logistics), report production, and any specific costs like IRB fees.
8. Management and Ethics
Management arrangements:
- Who manages the evaluation (M&E Manager, program director, steering committee)?
- Who is the primary point of contact?
- What documents and data will be shared with the evaluator?
- How will comments on deliverables be consolidated (one document, not 10 separate emails)?
- Who approves the inception report, draft report, and final report?
Ethical requirements:
- Informed consent protocols
- Data confidentiality and storage
- Do-no-harm considerations, especially for vulnerable populations
- Safeguarding requirements
- IRB or ethics committee approval if required by your organization or donor
Do not skip ethics. If your organization has a research ethics policy, reference it. If it does not, state the minimum standards you expect.
Common Mistakes
Mistake 1: Writing evaluation questions after the methodology is decided. The questions come first. The methodology serves the questions. If you start with "we want a quasi-experimental evaluation" and then write questions to justify it, you will end up with questions shaped by the method rather than by what you need to know.
Mistake 2: Asking too many evaluation questions. Every question you add dilutes the depth of every other question. With a $50,000 budget and 10 questions, you are paying $5,000 per question. That buys a shallow answer. Five focused questions with $10,000 each gets you useful evidence.
Mistake 3: No use plan. If the TOR does not explain who will use the findings and how, the evaluation becomes an expensive compliance exercise. Write the use plan before you write the evaluation questions. The questions should generate the evidence the decision-makers need.
Mistake 4: Timelines that make quality impossible. Asking for a final evaluation with primary data collection, analysis, and reporting in 4 weeks is asking for a bad evaluation. Build in adequate time for inception, fieldwork, analysis, drafting, review, and revision. If the timeline is non-negotiable, reduce the scope to match.
Mistake 5: Specifying every methodological detail. If you prescribe the sample size, sampling frame, analytical framework, and interview guides, you are not commissioning an evaluation. You are hiring data collectors. Set expectations and constraints. Let the evaluator propose the design. Evaluate their proposal on its technical merit.
Mistake 6: Forgetting the inception report. The inception report is where the evaluator translates your TOR into a concrete plan. If your TOR does not require one, the evaluator will go straight to fieldwork using whatever interpretation of your questions they prefer. Always require an inception report with a review and approval step before fieldwork begins.
TOR Checklist
Use this before you circulate the TOR.
- Background is concise (1 page max) and includes program theory of change summary
- Evaluation purpose states specific decisions the findings will inform
- Primary audience is named
- Scope boundaries are explicit (time period, geography, thematic areas, evaluation criteria)
- Evaluation questions are 3-8 in number, each answerable and actionable
- Methodology section sets expectations without prescribing the full design
- Non-negotiable requirements are clearly flagged (comparison group, disaggregation, sample minimums)
- Deliverables list includes inception report with approval gate before fieldwork
- Timeline is realistic (minimum 8-12 weeks for evaluations with primary data collection)
- Evaluator qualifications are specific but not exclusionary
- Selection criteria and weighting are stated
- Budget range is provided or ceiling is stated
- Ethics and safeguarding requirements are included
- Key documents list is attached (what the evaluator will receive)
- Review and approval process for each deliverable is described