Evaluation Report Scoring

AI Prompt Templates

Copy a prompt into Claude, ChatGPT, or Gemini. Paste your document at the bottom and run.

Paste a document and get a scored quality assessment with evidence and revision priorities.

6,422 characters

You are an expert M&E evaluation quality assessor. Score the evaluation report I will provide using the rubric below.

SCORING RUBRIC - Evaluation Report Scoring
Score each dimension 1-5 using these criteria:

DIMENSION 1: Methodology Rigor
- Score 5: Design stated with rationale for choosing it. Sampling section documents sample size, selection method, and representativeness. Both quantitative and qualitative analysis approaches described in replicable detail. Limitations acknowledged with discussion of implications for interpretation.
- Score 4: Design described with rationale (may be brief). Sampling documented but representativeness assumed. Analysis described. Limitations listed but their impact not discussed.
- Score 3: Design named but not described in usable detail. Sample size stated but selection method not explained. Analysis approach mentioned but not replicable from the description alone. At least one limitation acknowledged but without implications.
- Score 2: Methodology section brief or generic. Sample size and selection not justified. Analysis described in one sentence or less. Limitations section absent, OR limitations listed without their effect on findings.
- Score 1: No methodology section.

DIMENSION 2: Evidence Quality and Triangulation
- Score 5: All major findings supported by at least two independent sources. Quantitative and qualitative evidence integrated within each finding (not in separate silos). Contradictory evidence noted and explained. Each finding traceable to specific data.
- Score 4: At least 80 percent of findings have triangulation. The remainder rely on a single source with the limitation acknowledged. Both evidence types present with some integration.
- Score 3: Half or more findings have triangulation. Single-source reliance is common. Quantitative and qualitative data both present but largely reported in parallel rather than integrated.
- Score 2: Less than half of findings have triangulation, OR quantitative and qualitative are presented in fully separate sections without integration.
- Score 1: No triangulation. Findings appear to reflect evaluator opinion without documented evidence.

DIMENSION 3: Findings Presentation
- Score 5: Findings organized by evaluation question. Every finding includes magnitude and direction of change. Disaggregated by sex, age, location, or other relevant variables where data allow. Negative and unexpected findings receive equal space as positive ones.
- Score 4: Findings organized clearly (may follow program components rather than evaluation questions). At least 80 percent of findings include direction of change. Some disaggregation present. Negative findings reported but may receive less space.
- Score 3: Findings organized but not consistently aligned to evaluation questions. Half or more include magnitude. Disaggregation attempted for at least one variable. Negative findings mentioned but receive clearly less emphasis than positive ones.
- Score 2: Findings organized by data source or as a chronological narrative. Magnitude unclear for half or more findings. Disaggregation absent or limited. Negative findings minimized.
- Score 1: Qualitative narrative with no structure. No disaggregation. Reads as a success story.

DIMENSION 4: Conclusions and Recommendations
- Score 5: Every conclusion cites the finding(s) it draws on and does not exceed what the evidence supports. Every recommendation specifies action, responsible party, and timeframe. Recommendations address root causes. Total recommendations between 5 and 12. Priority recommendations distinguished from secondary.
- Score 4: Every conclusion linked to findings, though some links may be implied. At least 80 percent of recommendations are specific. Responsible parties identified for at least 80 percent.
- Score 3: Conclusions consistent with findings but links are not explicitly stated. Half or more recommendations are specific; the remainder remain general. Half or more have responsible parties named. Total recommendations between 3 and 15.
- Score 2: One or more conclusions go beyond evidence or contradict findings, OR half or more recommendations are generic, OR responsible parties not specified for half or more, OR total recommendations outside reasonable range (fewer than 3 or more than 15).
- Score 1: Conclusions contradict or ignore findings. Recommendations not actionable. No link between findings, conclusions, and recommendations.

DIMENSION 5: Ethical and Inclusive Reporting
- Score 5: Community members participated in findings validation beyond data collection. Report explicitly addresses power dynamics. Data anonymized for every potentially identifiable data point. Key conclusions communicated back to affected communities with documentation.
- Score 4: Community participation beyond data collection evidenced. Anonymization applied. Findings shared with implementing partners.
- Score 3: Some evidence of participant engagement beyond data collection. Anonymization applied for at least 80 percent of potentially identifiable data points but inconsistently. Limited documentation of inclusive process.
- Score 2: Ethical protections mentioned but inconsistently applied. Half or more of identifiable data points are not anonymized (direct quotes attributable to named individuals or locations specific enough to identify respondents). No findings validation with data providers.
- Score 1: No ethical protections documented. Individual data potentially identifiable throughout. No evidence of inclusive process or community feedback loop.

ADDITIONAL TASK: List every recommendation that lacks a named responsible party or specific action. For any that is generic or not actionable, provide a rewritten version that is specific.

OUTPUT FORMAT:

| Dimension | Score (1-5) | Evidence from Report | Priority Action |
|-----------|-------------|---------------------|----------------|
| Methodology Rigor | | | |
| Evidence and Triangulation | | | |
| Findings Presentation | | | |
| Conclusions and Recommendations | | | |
| Ethical and Inclusive Reporting | | | |

**Total: X/25**
**Band:** Strong (22-25) / Adequate (17-21) / Needs Revision (11-16) / Substantial Revision (5-10)
**Single Most Important Revision:** [One specific sentence]

Then list all non-actionable recommendations with rewritten versions.

EVALUATION REPORT TO SCORE:
[Paste your evaluation report or key sections here]

Scoring Criteria

Methodology Rigor

5Excellent

Design stated with rationale for choosing it. Sampling section documents sample size, selection method, and representativeness. Both quantitative and qualitative analysis approaches described in replicable detail. Limitations acknowledged with discussion of implications for interpretation.

4Good

Design described with rationale (may be brief). Sampling documented but representativeness assumed. Analysis described. Limitations listed but their impact not discussed.

3Adequate

Design named but not described in usable detail. Sample size stated but selection method not explained. Analysis approach mentioned but not replicable from the description alone. At least one limitation acknowledged but without implications.

2Needs Improvement

Methodology section brief or generic. Sample size and selection not justified. Analysis described in one sentence or less. Limitations section absent, OR limitations listed without their effect on findings.

1Inadequate

No methodology section.

Evidence Quality and Triangulation

5Excellent

All major findings supported by at least two independent sources. Quantitative and qualitative evidence integrated within each finding (not in separate silos). Contradictory evidence noted and explained. Each finding traceable to specific data.

4Good

At least 80 percent of findings have triangulation. The remainder rely on a single source with the limitation acknowledged. Both evidence types present with some integration.

3Adequate

Half or more findings have triangulation. Single-source reliance is common. Quantitative and qualitative data both present but largely reported in parallel rather than integrated.

2Needs Improvement

Less than half of findings have triangulation, OR quantitative and qualitative are presented in fully separate sections without integration.

1Inadequate

No triangulation. Findings appear to reflect evaluator opinion without documented evidence.

Findings Presentation

5Excellent

Findings organized by evaluation question. Every finding includes magnitude and direction of change. Disaggregated by sex, age, location, or other relevant variables where data allow. Negative and unexpected findings receive equal space as positive ones.

4Good

Findings organized clearly (may follow program components rather than evaluation questions). At least 80 percent of findings include direction of change. Some disaggregation present. Negative findings reported but may receive less space.

3Adequate

Findings organized but not consistently aligned to evaluation questions. Half or more include magnitude. Disaggregation attempted for at least one variable. Negative findings mentioned but receive clearly less emphasis than positive ones.

2Needs Improvement

Findings organized by data source or as a chronological narrative. Magnitude unclear for half or more findings. Disaggregation absent or limited. Negative findings minimized.

1Inadequate

Qualitative narrative with no structure. No disaggregation. Reads as a success story.

Conclusions and Recommendations

5Excellent

Every conclusion cites the finding(s) it draws on and does not exceed what the evidence supports. Every recommendation specifies action, responsible party, and timeframe. Recommendations address root causes. Total recommendations between 5 and 12. Priority recommendations distinguished.

4Good

Every conclusion linked to findings, though some links may be implied. At least 80 percent of recommendations are specific. Responsible parties identified for at least 80 percent.

3Adequate

Conclusions consistent with findings but links are not explicitly stated. Half or more recommendations are specific; the remainder remain general. Half or more have responsible parties named. Total recommendations between 3 and 15.

2Needs Improvement

One or more conclusions go beyond evidence or contradict findings, OR half or more recommendations are generic, OR responsible parties not specified for half or more, OR total recommendations outside reasonable range (fewer than 3 or more than 15).

1Inadequate

Conclusions contradict or ignore findings. Recommendations not actionable. No link between findings, conclusions, and recommendations.

Ethical and Inclusive Reporting

5Excellent

Community members participated in findings validation beyond data collection. Report explicitly addresses power dynamics. Data anonymized for every potentially identifiable data point. Key conclusions communicated back to affected communities with documentation.

4Good

Community participation beyond data collection evidenced. Anonymization applied. Findings shared with implementing partners.

3Adequate

Some evidence of participant engagement beyond data collection. Anonymization applied for at least 80 percent of potentially identifiable data points but inconsistently. Limited documentation of inclusive process.

2Needs Improvement

Ethical protections mentioned but inconsistently applied. Half or more of identifiable data points are not anonymized (direct quotes attributable to named individuals or locations specific enough to identify respondents). No findings validation with data providers.

1Inadequate

No ethical protections documented. Individual data potentially identifiable throughout. No evidence of inclusive process or community feedback loop.

Score Interpretation

Total (out of 25)	Band	Next Step
22-25	Strong	Approve with minor editorial requests only
17-21	Adequate	Request targeted revisions on the 1-2 lowest dimensions. Set a deadline for revised draft.
11-16	Needs Revision	Return to evaluation team with the AI scorecard as revision brief. Do not approve until re-reviewed.
5-10	Substantial Revision	Does not meet minimum quality standards. Discuss whether substantial revision is feasible or a supplementary data collection round is needed.

Prompts Using This Rubric

Create an Evaluation Matrix

Build an evaluation matrix linking evaluation questions to criteria, indicators, data sources, and methods.

Draft an Evaluation Report

Write up evaluation findings into a professional report with methodology, results, and recommendations.

Draft Evaluation Terms of Reference

Write terms of reference for commissioning an external evaluation, including scope, questions, and methodology.

Review My Evaluation Design

Get feedback on your evaluation methodology, questions, sampling, and analysis plan before fieldwork.

Design a Mid-Term Evaluation

Plan a mid-term evaluation to check program progress, relevance, and early outcomes.

Review an Evaluation Report Draft

Review an evaluation report draft for quality, evidence, and recommendations.

Back to Rubric Library