Scoring Criteria
Design stated with rationale for choosing it. Sampling section documents sample size, selection method, and representativeness. Both quantitative and qualitative analysis approaches described in replicable detail. Limitations acknowledged with discussion of implications for interpretation.
Design described with rationale (may be brief). Sampling documented but representativeness assumed. Analysis described. Limitations listed but their impact not discussed.
Design named but not described in usable detail. Sample size stated but selection method not explained. Analysis approach mentioned but not replicable from the description alone. At least one limitation acknowledged but without implications.
Methodology section brief or generic. Sample size and selection not justified. Analysis described in one sentence or less. Limitations section absent, OR limitations listed without their effect on findings.
No methodology section.
All major findings supported by at least two independent sources. Quantitative and qualitative evidence integrated within each finding (not in separate silos). Contradictory evidence noted and explained. Each finding traceable to specific data.
At least 80 percent of findings have triangulation. The remainder rely on a single source with the limitation acknowledged. Both evidence types present with some integration.
Half or more findings have triangulation. Single-source reliance is common. Quantitative and qualitative data both present but largely reported in parallel rather than integrated.
Less than half of findings have triangulation, OR quantitative and qualitative are presented in fully separate sections without integration.
No triangulation. Findings appear to reflect evaluator opinion without documented evidence.
Findings organized by evaluation question. Every finding includes magnitude and direction of change. Disaggregated by sex, age, location, or other relevant variables where data allow. Negative and unexpected findings receive equal space as positive ones.
Findings organized clearly (may follow program components rather than evaluation questions). At least 80 percent of findings include direction of change. Some disaggregation present. Negative findings reported but may receive less space.
Findings organized but not consistently aligned to evaluation questions. Half or more include magnitude. Disaggregation attempted for at least one variable. Negative findings mentioned but receive clearly less emphasis than positive ones.
Findings organized by data source or as a chronological narrative. Magnitude unclear for half or more findings. Disaggregation absent or limited. Negative findings minimized.
Qualitative narrative with no structure. No disaggregation. Reads as a success story.
Every conclusion cites the finding(s) it draws on and does not exceed what the evidence supports. Every recommendation specifies action, responsible party, and timeframe. Recommendations address root causes. Total recommendations between 5 and 12. Priority recommendations distinguished.
Every conclusion linked to findings, though some links may be implied. At least 80 percent of recommendations are specific. Responsible parties identified for at least 80 percent.
Conclusions consistent with findings but links are not explicitly stated. Half or more recommendations are specific; the remainder remain general. Half or more have responsible parties named. Total recommendations between 3 and 15.
One or more conclusions go beyond evidence or contradict findings, OR half or more recommendations are generic, OR responsible parties not specified for half or more, OR total recommendations outside reasonable range (fewer than 3 or more than 15).
Conclusions contradict or ignore findings. Recommendations not actionable. No link between findings, conclusions, and recommendations.
Community members participated in findings validation beyond data collection. Report explicitly addresses power dynamics. Data anonymized for every potentially identifiable data point. Key conclusions communicated back to affected communities with documentation.
Community participation beyond data collection evidenced. Anonymization applied. Findings shared with implementing partners.
Some evidence of participant engagement beyond data collection. Anonymization applied for at least 80 percent of potentially identifiable data points but inconsistently. Limited documentation of inclusive process.
Ethical protections mentioned but inconsistently applied. Half or more of identifiable data points are not anonymized (direct quotes attributable to named individuals or locations specific enough to identify respondents). No findings validation with data providers.
No ethical protections documented. Individual data potentially identifiable throughout. No evidence of inclusive process or community feedback loop.
Score Interpretation
| Total (out of 25) | Band | Next Step |
|---|---|---|
| 22-25 | Strong | Approve with minor editorial requests only |
| 17-21 | Adequate | Request targeted revisions on the 1-2 lowest dimensions. Set a deadline for revised draft. |
| 11-16 | Needs Revision | Return to evaluation team with the AI scorecard as revision brief. Do not approve until re-reviewed. |
| 5-10 | Substantial Revision | Does not meet minimum quality standards. Discuss whether substantial revision is feasible or a supplementary data collection round is needed. |