Evaluation Matrix

When to Use

An evaluation matrix is the right tool when you need to translate high-level evaluation questions into an actionable plan for data collection and analysis. Use it when:

Planning an evaluation's operational details — after the evaluation terms of reference defines what will be evaluated, the matrix specifies how each question will be answered with concrete data sources and methods.
Ensuring methodological rigor — to verify that every evaluation question has at least one clear path to evidence, and ideally multiple sources for triangulation.
Coordinating evaluation team work — when multiple evaluators or data collectors need clarity on what data to collect, from whom, using what methods, and for which evaluation question.
Communicating evaluation design to stakeholders — donors, programme teams, and partners need to understand how the evaluation will work before data collection begins.
Developing the inception report — external evaluators use the evaluation matrix as a key component of their detailed work plan following contract award.

An evaluation matrix is less useful when you're still developing the evaluation questions themselves (use evaluation planning first) or when you need a high-level summary for non-technical stakeholders (a one-page evaluation design overview may suffice).

| Scenario | Use Evaluation Matrix? | Better Alternative | |-----|---|---| | Planning evaluation operations | Yes | — | | Still developing evaluation questions | Alongside | Evaluation Questions | | High-level donor briefing | Partial | Evaluation design summary | | Selecting evaluators | After | Evaluation ToR | | Post-evaluation learning review | No | Lessons Learned |

How It Works

Developing an evaluation matrix follows a structured sequence. Each step builds on the previous one, moving from abstract questions to concrete data collection plans.

Start with the evaluation questions. Begin with the finalized evaluation questions derived from the OECD/DAC criteria (relevance, effectiveness, efficiency, impact, sustainability). These questions are typically specified in the evaluation ToR and should be agreed upon by all stakeholders before proceeding.
Identify data sources for each question. For every evaluation question, specify what data will answer it. This includes primary sources (surveys, interviews, focus groups) and secondary sources (programme records, existing reports, administrative data). A well-designed matrix ensures each question has multiple data sources for triangulation.
Specify collection methods and tools. For each data source, define the exact data collection method: structured survey, semi-structured interview, focus group discussion, direct observation, document review, or participatory methods. Link each method to specific data collection tools (questionnaires, interview guides, observation checklists).
Define sampling and coverage. Specify who will provide the data: target populations, sample sizes, sampling methods, and any stratification requirements (by geography, gender, beneficiary status, etc.). This ensures the evaluation reaches the right people to answer each question.
Map indicators to questions. Link specific indicators to each evaluation question, showing how quantitative data will contribute to answering the question. This connects the evaluation to routine monitoring data where possible.
Specify analysis approaches. For each question, describe how the data will be analysed: statistical tests, thematic analysis, contribution analysis, cost-benefit analysis, or comparative methods. This ensures the evaluation team has a clear analytical framework.
Assign responsibilities and timelines. Specify who is responsible for each data collection activity, when it will occur, and what deliverables are expected. This transforms the matrix into an actionable workplan.

Key Components

A well-constructed evaluation matrix includes these essential elements:

Evaluation questions — the specific questions the evaluation will answer, organized by evaluation criterion (relevance, effectiveness, efficiency, impact, sustainability). Each question should be specific enough to guide data collection but broad enough to capture meaningful insights.
Data sources — for each question, the specific sources of evidence: primary data (collected specifically for the evaluation) and secondary data (existing information). A robust matrix shows multiple sources per question for triangulation.
Data collection methods — the approaches for gathering data: surveys, interviews, focus groups, observation, document review, participatory methods. Each method should be appropriate for the question and target population.
Data collection tools — the specific instruments for each method: survey questionnaires, interview guides, focus group discussion guides, observation checklists, document review templates.
Sampling specifications — who will provide the data, including target populations, sample sizes, sampling methods, and any stratification or disaggregation requirements.
Indicators — specific indicators that will contribute to answering each question, linking the evaluation to routine monitoring data where possible.
Analysis approaches — how data will be analysed for each question: statistical analysis, thematic analysis, comparative methods, contribution analysis, cost-effectiveness analysis.
Responsibilities and timeline — who is responsible for each activity, when it will occur, and key milestones for data collection and analysis.

Best Practices

Follow a general sequence in matrix development. There is a general sequence to completing the matrix, which starts with the project description and logic of intervention (top down), then the assumptions and risks, followed by the indicators and data sources. This ensures the matrix reflects the programme's intended logic before specifying how to measure it. (MEAL Rule: EX56_R084)

Use participatory approaches for indicator development. To develop indicator matrices, give practical examples to explain the concept of 'indicator' to participants not familiar with it. Take one activity from plan of work and brainstorm 'How can we know if the activity is being carried out according to plan?' This builds shared understanding and ownership of the evaluation design. (MEAL Rule: EX12_P029)

Derive evaluation questions from the purpose. Evaluation questions are derived from the purpose(s) of the evaluation. They should flow directly from the evaluation objectives and be aligned with the OECD/DAC criteria. Well-formed questions guide the entire matrix design. (MEAL Rule: EX092_R007)

Ensure the matrix specifies required indicators. The indicator matrix must specify the indicators that will be used to measure progress and outcomes. This includes both standard indicators (for donor reporting) and custom indicators (for specific evaluation questions). Clear specification prevents ambiguity about what will be measured. (MEAL Rule: EX117_D018)

Design for triangulation. Each evaluation question should have multiple data sources and methods. Triangulation increases confidence in findings and helps identify discrepancies between different perspectives. A single data source per question is a vulnerability, not a design choice.

Link to routine monitoring data. Where possible, connect evaluation indicators to routine monitoring indicators. This reduces duplication, leverages existing data collection systems, and ensures the evaluation builds on programme learning rather than operating in isolation.

Keep it accessible to non-technical stakeholders. While the evaluation matrix is a technical document, it should be understandable to programme staff and stakeholders who will contribute to or be affected by the evaluation. Use clear language, avoid unnecessary jargon, and consider creating a simplified version for broader audiences.

Review and revise during the inception phase. The evaluation matrix is not set in stone. During the inception phase, external evaluators may refine the matrix based on deeper understanding of the programme context, stakeholder feedback, or emerging constraints. Build in flexibility for iteration.

Common Mistakes

Developing the matrix before finalizing evaluation questions. The most common failure is creating the evaluation matrix before the evaluation questions are fully developed and agreed upon. The matrix should flow from the questions, not the other way around. Questions that are too broad or too narrow will produce a matrix that is either unmanageable or insufficiently detailed.

Leaving evaluation questions without clear data sources. Every evaluation question must have at least one clear path to evidence. A matrix that includes questions without specified data sources creates ambiguity about what will be measured and how. This often happens when questions are derived from donor requirements without considering data availability.

Relying on a single data source per question. Designing the evaluation to answer each question with a single data source or method creates vulnerability to bias and limits confidence in findings. Triangulation — using multiple data sources and methods — is essential for rigorous evaluation.

Failing to specify sampling details. An evaluation matrix that doesn't specify who will provide the data, how many, and using what sampling method leaves critical gaps in the evaluation design. This leads to ad-hoc sampling decisions during implementation that can compromise validity.

Not linking to routine monitoring. Creating a separate set of indicators for the evaluation that has no connection to routine monitoring data creates duplication, increases data collection burden, and misses opportunities to leverage existing systems. Where possible, evaluation indicators should align with monitoring indicators.

Over-specifying methods and limiting evaluator innovation. While the matrix should specify methods clearly, it should also allow evaluators to propose their optimal approach. A matrix that is too prescriptive limits evaluator innovation and may not reflect the most efficient or effective methods for the context.

Examples

Health — USAID Maternal Health Programme, Kenya

A 5-year maternal health programme commissioned a final evaluation with six evaluation questions aligned with OECD/DAC criteria. The evaluation matrix specified: (1) five data sources per question including facility surveys, beneficiary interviews, provider interviews, document review of clinical records, and community focus groups; (2) sampling specifications including 60 facilities, 300 beneficiaries, and 50 providers across three counties; (3) indicators linking to routine HMIS data for outcome measures; and (4) analysis approaches including descriptive statistics, thematic analysis, and comparative case studies. The matrix was developed during the ToR phase and refined during the evaluator inception phase. The evaluation identified key lessons for scale-up that informed the follow-on programme design.

Education — DFID Teacher Training Initiative, Nigeria

A teacher training programme developed an evaluation matrix that emphasized participatory methods. The matrix specified focus group discussions with teachers, school administrators, and education officials as primary data sources, supplemented by classroom observation and document review of training materials. For indicator development, the evaluation team used participatory exercises with programme staff to brainstorm how to measure training effectiveness, building shared understanding of what 'quality training' means in the local context. The matrix included gender-disaggregated data collection requirements and specified that men and women would provide input separately on resource prioritization. The evaluation revealed significant variation in training quality across regions that would have been invisible with a single aggregate measure.

Governance — EU Civil Society Support, Sierra Leone

A governance programme working with civil society organizations developed an evaluation matrix that included outcome harvesting as a primary method to capture both planned and unplanned outcomes. The matrix specified: (1) evaluation questions organized by DAC criteria; (2) multiple data sources per question including key informant interviews, document review, and outcome harvesting interviews; (3) sampling that included both direct beneficiaries and indirect stakeholders (government partners, donor representatives, community members); and (4) analysis approaches including contribution analysis and outcome mapping. The matrix was developed collaboratively with programme staff through participatory workshops, building ownership and ensuring the evaluation captured the programme's complexity. The evaluation revealed significant unplanned outcomes through informal influence pathways, leading to programme adaptation.

Compared To

An evaluation matrix is one of several documents used in evaluation planning and management. The key differences:

| Feature | Evaluation Matrix | Evaluation ToR | Inception Report | M&E Plan | |-----|---|---|---|---| | Primary purpose | Map evaluation questions to data sources and methods | Commission evaluation and select evaluators | Detail evaluation approach post-selection | Guide overall M&E system | | When developed | During evaluation planning | Before evaluator selection | After evaluator selection, before data collection | At programme design | | Audience | Evaluation team, internal stakeholders | Potential evaluators, selection committee | Evaluation team, stakeholders | Programme team, M&E staff | | Level of detail | Question-to-evidence mapping | Requirements and expectations | Methodological specifics | Comprehensive M&E guidance | | Ownership | Evaluation team (with stakeholder input) | Commissioning organization | External evaluator | Programme management |

Relevant Indicators

8 indicators across 4 major donor frameworks (USAID, DFID, UNDP, IFRC) relate to evaluation matrix quality and use:

Matrix documentation — "Proportion of evaluations with documented evaluation matrices linking questions to data sources" (USAID)
Triangulation — "Percentage of evaluation questions with multiple data sources for triangulation" (DFID)
Matrix review frequency — "Frequency of evaluation matrix reviews during inception phase" (UNDP)
Activity completion — "Proportion of evaluation activities specified in the matrix that are completed as planned" (IFRC)

Related Tools

Evaluation Matrix Template — Structured template for developing comprehensive evaluation matrices with built-in checks for data source coverage
Evaluation Planner — Interactive tool for mapping evaluation questions to methods, sources, and analysis approaches