Evaluation Report Scoring

AI Prompt Templates

Copy a prompt into Claude, ChatGPT, or Gemini. Paste your document at the bottom and run.

Paste a document and get a scored quality assessment with evidence and revision priorities.

6,524 characters

You are an expert M&E evaluation quality assessor. Score the evaluation report I will provide using the rubric below.

SCORING RUBRIC - Evaluation Report Scoring
Score each dimension 1-5 using these criteria:

DIMENSION 1: Methodology Rigor
- Score 5: Design is explicitly stated with rationale for choosing it. Sampling documents sample size, selection method, and representativeness. Both quantitative and qualitative analysis approaches described in replicable detail. Limitations acknowledged with discussion of implications for interpretation.
- Score 4: Design described but rationale is limited. Sampling documented but representativeness assumed. Analysis described. Limitations listed without discussion of their impact.
- Score 3: Design named but not described in usable detail. Sample size stated but selection method not explained. Analysis approach mentioned but not replicable from description alone. At least one limitation acknowledged but without implications.
- Score 2: Methodology section brief or generic. Sample size and selection not justified. Analysis described in one sentence. Limitations section absent or lists limitations without discussing their effect on findings.
- Score 1: No methodology section. Design not described. Sampling not mentioned. Limitations not acknowledged.

DIMENSION 2: Evidence Quality and Triangulation
- Score 5: All major findings supported by at least two independent sources (triangulation). Quantitative and qualitative evidence integrated, not in separate silos. Contradictory evidence noted and explained. Basis for each finding traceable to data.
- Score 4: Most findings have triangulation. 1-2 findings rely on single source, limitation acknowledged. Both evidence types present with some integration.
- Score 3: About half of findings have triangulation. Single-source reliance is common but not universal. Quantitative and qualitative data both present but largely reported in parallel rather than integrated. Reader can trace some findings to data.
- Score 2: Most findings rely on a single data source. Quantitative and qualitative presented in parallel without integration. Reader cannot trace how conclusions were reached.
- Score 1: No triangulation. Findings appear to reflect evaluator opinion without documented evidence. No data tables, counts, or source references alongside findings.

DIMENSION 3: Findings Presentation
- Score 5: Organized by evaluation question (not by data source or program chronology). Results include magnitude and direction of change. Disaggregated by sex, age, location, or other relevant variables where data allow. Negative and unexpected findings receive equal attention as positive ones.
- Score 4: Clearly organized, may follow program components rather than evaluation questions. Most include direction of change. Some disaggregation. Negative findings reported though may receive less space.
- Score 3: Organization is identifiable (by component or theme) but not consistently aligned to evaluation questions. Some results include magnitude. Disaggregation attempted for at least one variable. Negative findings mentioned but clearly receive less emphasis than positive ones.
- Score 2: Organized by data source or are a chronological narrative. Magnitude unclear. Disaggregation absent or limited. Negative findings minimized.
- Score 1: Qualitative narrative with no structure. No disaggregation. Reads as a success story. Reader cannot determine what the program achieved.

DIMENSION 4: Conclusions and Recommendations
- Score 5: Each conclusion cites the finding(s) it draws on and does not go beyond what the evidence supports. Recommendations are specific (what action, by whom, by when), realistic, and address root causes. Total recommendations manageable (5-12). Priority recommendations distinguished from secondary ones.
- Score 4: Conclusions linked to findings but some connections implied. Recommendations mostly specific with 1-2 exceptions. Responsible parties identified for most.
- Score 3: Conclusions generally consistent with findings but links are not explicitly stated. Several recommendations are specific while others remain general. At least some responsible parties named. Number of recommendations within reasonable range.
- Score 2: Conclusions go beyond evidence or contradict findings. Recommendations generic (could apply to any program). Responsible parties not specified. Too many (15+) or too few.
- Score 1: Conclusions contradict or ignore findings. Recommendations not actionable. No link between findings, conclusions, and recommendations.

DIMENSION 5: Ethical and Inclusive Reporting
- Score 5: Community members participated in findings validation (beyond data collection). Report addresses power dynamics in data collection and interpretation. Data anonymized where identification could cause harm. Key conclusions communicated back to affected communities.
- Score 4: Community participation beyond data collection evidenced. Basic anonymization applied. Findings shared with implementing partners.
- Score 3: Some evidence of participant engagement beyond data collection (e.g., findings shared informally or reviewed by community representatives). Anonymization applied in most cases but not consistently. Limited documentation of inclusive process.
- Score 2: Ethical protections mentioned but inconsistently applied (direct quotes attributable to named individuals, or locations specific enough to identify respondents). No findings validation with data providers.
- Score 1: No ethical protections documented. Individual data potentially identifiable. No evidence of inclusive process or community feedback loop.

ADDITIONAL TASK: List every recommendation that lacks a named responsible party or specific action. For any that is generic or not actionable, provide a rewritten version that is specific.

OUTPUT FORMAT:

| Dimension | Score (1-5) | Evidence from Report | Priority Action |
|-----------|-------------|---------------------|----------------|
| Methodology Rigor | | | |
| Evidence and Triangulation | | | |
| Findings Presentation | | | |
| Conclusions and Recommendations | | | |
| Ethical and Inclusive Reporting | | | |

**Total: X/25**
**Band:** Strong (22-25) / Adequate (17-21) / Needs Revision (11-16) / Substantial Revision (5-10)
**Single Most Important Revision:** [One specific sentence]

Then list all non-actionable recommendations with rewritten versions.

EVALUATION REPORT TO SCORE:
[Paste your evaluation report or key sections here]

Scoring Criteria

Methodology Rigor

5Excellent

Design is explicitly stated and the rationale for choosing it is explained. Sampling rationale documents sample size, selection method, and representativeness. Both quantitative and qualitative analysis approaches described in replicable detail. Limitations acknowledged and their implications for interpreting findings discussed.

4Good

Design described but rationale is limited. Sampling documented but representativeness assumed. Analysis described. Limitations listed but their impact is not discussed.

3Adequate

Design named but not described in usable detail. Sample size stated but selection method not explained. Analysis approach mentioned but not replicable from description alone. At least one limitation acknowledged but without implications.

2Needs Improvement

Methodology section brief or generic. Sample size and selection not justified. Analysis described in one sentence. Limitations section absent or lists limitations without discussing how they affect findings.

1Inadequate

No methodology section. Design not described. Sampling not mentioned. Limitations not acknowledged.

Evidence Quality and Triangulation

5Excellent

All major findings supported by at least two independent sources (triangulation). Quantitative and qualitative evidence integrated, not in separate silos. Contradictory evidence noted and explained. Basis for each finding traceable to data.

4Good

Most findings have triangulation. 1-2 findings rely on single source, limitation acknowledged. Both evidence types present with some integration.

3Adequate

About half of findings have triangulation. Single-source reliance is common but not universal. Both evidence types present but largely reported in parallel. Reader can trace some findings to data.

2Needs Improvement

Most findings rely on a single data source. Quantitative and qualitative presented in parallel without integration. Reader cannot trace how conclusions were reached.

1Inadequate

No triangulation. Findings reflect evaluator opinion without documented evidence. No data tables, counts, or source references alongside findings.

Findings Presentation

5Excellent

Organized by evaluation question. Results include magnitude and direction of change. Disaggregated by sex, age, location, or other variables where data allow. Negative and unexpected findings receive equal attention as positive findings.

4Good

Clearly organized, may follow program components rather than evaluation questions. Most include direction of change. Some disaggregation. Negative findings reported though may receive less space.

3Adequate

Organization is identifiable but not consistently aligned to evaluation questions. Some results include magnitude. Disaggregation attempted for at least one variable. Negative findings mentioned but receive clearly less emphasis than positive ones.

2Needs Improvement

Organized by data source or are a chronological narrative. Magnitude unclear. Disaggregation absent or limited to one variable. Negative findings minimized or absent.

1Inadequate

Qualitative narrative with no structure. No disaggregation. Reads as a success story. Reader cannot determine what the program achieved.

Conclusions and Recommendations

5Excellent

Each conclusion cites the finding(s) it draws on and does not go beyond what the evidence supports. Recommendations are specific (what action, by whom, by when), realistic, and address root causes. Total recommendations manageable (5-12). Priority recommendations distinguished.

4Good

Conclusions linked to findings but some connections implied. Recommendations mostly specific with 1-2 exceptions. Responsible parties identified for most.

3Adequate

Conclusions generally consistent with findings but links are not explicitly stated. Several recommendations are specific while others remain general. At least some responsible parties named. Number of recommendations within reasonable range.

2Needs Improvement

Conclusions go beyond evidence or contradict findings. Recommendations generic. Responsible parties not specified. Too many (15+) or too few.

1Inadequate

Conclusions contradict or ignore findings. Recommendations not actionable. No link between findings, conclusions, and recommendations.

Ethical and Inclusive Reporting

5Excellent

Community members participated in findings validation (beyond data collection). Report addresses power dynamics. Data anonymized where identification could cause harm. Key conclusions communicated back to affected communities.

4Good

Community participation beyond data collection evidenced. Basic anonymization applied. Findings shared with implementing partners.

3Adequate

Some evidence of participant engagement beyond data collection. Anonymization applied in most cases but not consistently. Limited documentation of inclusive process.

2Needs Improvement

Ethical protections mentioned but inconsistently applied (direct quotes attributable to named individuals). No findings validation with data providers.

1Inadequate

No ethical protections documented. Individual data potentially identifiable. No evidence of inclusive process or community feedback loop.

Score Interpretation

Total (out of 25)	Band	Next Step
22-25	Strong	Approve with minor editorial requests only
17-21	Adequate	Request targeted revisions on the 1-2 lowest dimensions. Set a deadline for revised draft.
11-16	Needs Revision	Return to evaluation team with the AI scorecard as revision brief. Do not approve until re-reviewed.
5-10	Substantial Revision	Does not meet minimum quality standards. Discuss whether substantial revision is feasible or a supplementary data collection round is needed.

Prompts Using This Rubric

Create an Evaluation Matrix

Build an evaluation matrix linking evaluation questions to criteria, indicators, data sources, and methods.

Draft an Evaluation Report

Write up evaluation findings into a professional report with methodology, results, and recommendations.

Draft Evaluation Terms of Reference

Write terms of reference for commissioning an external evaluation, including scope, questions, and methodology.

Review My Evaluation Design

Get feedback on your evaluation methodology, questions, sampling, and analysis plan before fieldwork.

Design a Mid-Term Evaluation

Plan a mid-term evaluation to check program progress, relevance, and early outcomes.

Back to Prompt Library