Scoring Criteria
All elements present. Questions align with the stated purpose, fit the budget and timeline, and remain proportionate to the program. No question requires evidence beyond the program scope.
At least three of four elements present. Purpose alignment and proportionality clear; budget or timeline fit partial.
At least two of four elements present. Questions broadly align with purpose but proportionality or resource fit is implicit.
Questions only loosely related to stated purpose. Several exceed scope or resources.
Absent or inadequate. Questions disconnected from purpose, scope, or resources.
All elements present. Each question answerable with contemplated methods. Data available, respondents accessible, unit of analysis consistent.
At least three of four elements present. Most questions answerable; one or two need minor scope adjustment.
At least two of four elements present. Data access or respondent reach uncertain for some.
Several questions unanswerable. Attribution or counterfactual claims unsupported by design.
Absent or inadequate. Questions cannot be answered with the contemplated methods or evidence.
All elements present. Each question tied to a named decision, learning need, or accountability obligation. User of each answer identified. Decision window matches evaluation timeline. Findings actionable.
At least three of four elements present. Decision relevance clear; named users or windows partial.
At least two of four elements present. Some questions tied to decisions; others framed as general inquiry.
Most questions read as academic interest. No named users.
Absent or inadequate. Questions have no apparent decision use.
All elements present. OECD-DAC or documented alternative framework used. Each question mapped to one or more criteria. Selection justified by evaluation purpose. Criteria not applied as a checklist.
At least three of four elements present. Criteria framework used and questions mapped; rationale or fit partial.
At least two of four elements present. Framework named but mapping implicit. Selection not justified.
Criteria listed at the start but questions not mapped. Or all criteria forced onto every question regardless of fit.
Absent or inadequate. No criteria framework used or used incoherently.
All elements present. Main-question count realistic (typically 3-6). Sub-questions bounded (2-4 each). Depth achievable. Team size and skill mix can deliver.
At least three of four elements present. Question count realistic; sub-question scope or team fit partial.
At least two of four elements present. Question count on the high end; sub-questions sometimes overlap.
Question set overcommitted (e.g., 10+ main questions, 30+ sub-questions). Depth not achievable.
Absent or inadequate. Question set cannot be addressed at any defensible depth.
Score Interpretation
| Total (out of 25) | Band | Next Step |
|---|---|---|
| 22-25 | Strong | Evaluation questions are ready. Minor refinements only. |
| 17-21 | Adequate | Address flagged dimensions before issuing the ToR for bids. |
| 11-16 | Needs Revision | Rework the questions before procurement. Use the Revise prompt as a revision brief. |
| 5-10 | Substantial Revision | Return to the evaluation purpose and redraft the question set. |