Scoring Criteria
Design is explicitly stated and the rationale for choosing it is explained. Sampling rationale documents sample size, selection method, and representativeness. Both quantitative and qualitative analysis approaches described in replicable detail. Limitations acknowledged and their implications for interpreting findings discussed.
Design described but rationale is limited. Sampling documented but representativeness assumed. Analysis described. Limitations listed but their impact is not discussed.
Design named but not described in usable detail. Sample size stated but selection method not explained. Analysis approach mentioned but not replicable from description alone. At least one limitation acknowledged but without implications.
Methodology section brief or generic. Sample size and selection not justified. Analysis described in one sentence. Limitations section absent or lists limitations without discussing how they affect findings.
No methodology section. Design not described. Sampling not mentioned. Limitations not acknowledged.
All major findings supported by at least two independent sources (triangulation). Quantitative and qualitative evidence integrated, not in separate silos. Contradictory evidence noted and explained. Basis for each finding traceable to data.
Most findings have triangulation. 1-2 findings rely on single source, limitation acknowledged. Both evidence types present with some integration.
About half of findings have triangulation. Single-source reliance is common but not universal. Both evidence types present but largely reported in parallel. Reader can trace some findings to data.
Most findings rely on a single data source. Quantitative and qualitative presented in parallel without integration. Reader cannot trace how conclusions were reached.
No triangulation. Findings reflect evaluator opinion without documented evidence. No data tables, counts, or source references alongside findings.
Organized by evaluation question. Results include magnitude and direction of change. Disaggregated by sex, age, location, or other variables where data allow. Negative and unexpected findings receive equal attention as positive findings.
Clearly organized, may follow programme components rather than evaluation questions. Most include direction of change. Some disaggregation. Negative findings reported though may receive less space.
Organization is identifiable but not consistently aligned to evaluation questions. Some results include magnitude. Disaggregation attempted for at least one variable. Negative findings mentioned but receive clearly less emphasis than positive ones.
Organized by data source or are a chronological narrative. Magnitude unclear. Disaggregation absent or limited to one variable. Negative findings minimized or absent.
Qualitative narrative with no structure. No disaggregation. Reads as a success story. Reader cannot determine what the programme achieved.
Each conclusion cites the finding(s) it draws on and does not go beyond what the evidence supports. Recommendations are specific (what action, by whom, by when), realistic, and address root causes. Total recommendations manageable (5-12). Priority recommendations distinguished.
Conclusions linked to findings but some connections implied. Recommendations mostly specific with 1-2 exceptions. Responsible parties identified for most.
Conclusions generally consistent with findings but links are not explicitly stated. Several recommendations are specific while others remain general. At least some responsible parties named. Number of recommendations within reasonable range.
Conclusions go beyond evidence or contradict findings. Recommendations generic. Responsible parties not specified. Too many (15+) or too few.
Conclusions contradict or ignore findings. Recommendations not actionable. No link between findings, conclusions, and recommendations.
Community members participated in findings validation (beyond data collection). Report addresses power dynamics. Data anonymized where identification could cause harm. Key conclusions communicated back to affected communities.
Community participation beyond data collection evidenced. Basic anonymization applied. Findings shared with implementing partners.
Some evidence of participant engagement beyond data collection. Anonymization applied in most cases but not consistently. Limited documentation of inclusive process.
Ethical protections mentioned but inconsistently applied (direct quotes attributable to named individuals). No findings validation with data providers.
No ethical protections documented. Individual data potentially identifiable. No evidence of inclusive process or community feedback loop.
Score Interpretation
| Total (out of 25) | Band | Next Step |
|---|---|---|
| 22-25 | Strong | Approve with minor editorial requests only |
| 17-21 | Adequate | Request targeted revisions on the 1-2 lowest dimensions. Set a deadline for revised draft. |
| 11-16 | Needs Revision | Return to evaluation team with the AI scorecard as revision brief. Do not approve until re-reviewed. |
| 5-10 | Substantial Revision | Does not meet minimum quality standards. Discuss whether substantial revision is feasible or a supplementary data collection round is needed. |