Skip to main content
M&E Studio
Home
Services
Tools
AI for M&E
Workflows
Plugins
Prompts
Start a Conversation
Library
Contribution AnalysisDevelopmental EvaluationImpact EvaluationLogframe / Logical FrameworkMost Significant ChangeOutcome HarvestingOutcome MappingParticipatory EvaluationProcess TracingQuasi-Experimental DesignRealist EvaluationResults FrameworkResults-Based ManagementTheory of ChangeUtilization-Focused Evaluation
M&E Studio

Decision-Grade M&E, Responsibly Built

About

  • About Us
  • Contact
  • LinkedIn

Services

  • Our Services
  • Tools

AI for M&E

  • Workflows
  • Plugins
  • Prompts
  • AI Course

M&E Library

  • Decision Guides
  • Indicators
  • Reference
  • Downloads

Legal

  • Terms
  • Privacy
  • Accessibility

© 2026 Logic Lab LLC. All rights reserved.

  1. M&E Library
  2. /
  3. Realist Evaluation
PillarMethods8 min read

Realist Evaluation

An evaluation approach that asks what works, for whom, in what circumstances, and why, by identifying the mechanisms through which programmes produce outcomes in specific contexts.

When to Use

Realist evaluation is the right approach when the question is not simply "did the programme work?" but "for whom did it work, under what conditions, and through what mechanisms?" Developed by Ray Pawson and Nick Tilley in the 1990s, realist evaluation is built on the insight that programmes do not cause outcomes directly, they introduce resources and opportunities that trigger responses in specific people in specific contexts.

Use it when:

  • Outcomes vary across sites or populations: the programme shows strong results in some places and weak results in others, and you need to understand why
  • Context is central: the programme works through relationships, norms, or institutional conditions that differ meaningfully across settings
  • Theory refinement is the goal: you want to understand why a programme works in order to improve it, not just whether it works on average
  • Scale-up decisions require specificity: before expanding a programme, funders and managers need to know which contexts are necessary for the mechanisms to fire
  • Existing evidence is mixed: realist synthesis (the literature-based version) can reconcile conflicting findings from multiple evaluations of similar interventions

Realist evaluation is resource-intensive and produces probabilistic, context-specific findings rather than average treatment effects. It is not suitable when funders need a single yes/no effectiveness verdict, when resources are limited, or when the programme theory is very simple and context is relatively uniform.

ScenarioUse Realist Evaluation?Better Alternative
Why does it work for some and not others?Yes—
Average effect across all contextsNoImpact Evaluation
Simple, uniform interventionNoRCT or QED
Building causal argument without mechanismNoContribution Analysis
Scale-up context specificationYes—
Literature synthesis of mixed evidenceYes (realist synthesis)—

How It Works

Realist evaluation is built around one central analytical unit: the Context-Mechanism-Outcome (CMO) configuration. A CMO configuration states: in this context (C), this mechanism (M) is triggered, producing this outcome (O).

  • Context: the conditions (social, institutional, cultural, geographic, historical) within which a programme operates. Context is not just background; it activates or suppresses mechanisms
  • Mechanism: the causal process that connects a programme resource or activity to an outcome. Mechanisms are typically hidden, they involve how people reason and respond to programme inputs
  • Outcome: the observable change that results when a mechanism fires in a given context

Step 1: Develop initial programme theory (IPT)

Start with an explicit theory of how the programme is supposed to work. This is not just a logic model, it must articulate the mechanisms through which resources are expected to change behaviour.

Step 2: Generate CMO hypotheses

Translate the programme theory into a set of testable CMO configurations. For example: "When community health workers are respected figures in their community (C), free bednet provision triggers social norm activation around child protection (M), producing improved consistent bednet use (O)."

Step 3: Collect data to test the CMOs

Mixed methods are typically required. Quantitative data can test whether outcomes varied by context. Qualitative data (interviews, observations) can probe the mechanisms.

Step 4: Analyse CMO configurations

Examine which CMO configurations were confirmed, partially confirmed, or disconfirmed by the data. Where mechanisms did not fire as expected, identify what contextual factor suppressed them.

Step 5: Refine the programme theory

Revise the initial programme theory based on empirical findings. Realist evaluation is iterative, the theory improves with each cycle of hypothesis testing.

Step 6: Produce middle-range theory

Synthesise findings into transferable, middle-range theories that specify the conditions under which this type of intervention produces these types of outcomes. These are more useful for decision-making than context-specific findings alone.

Key Components

  • Initial programme theory: explicit causal logic articulating mechanisms, not just input-output chains
  • CMO configurations: testable hypotheses linking context, mechanism, and outcome
  • Context mapping: systematic documentation of the contextual factors relevant to mechanism activation
  • Mixed methods data collection: quantitative to test outcome variation by context; qualitative to probe mechanisms
  • Iterative theory refinement: repeated cycles of hypothesis testing and theory revision
  • Middle-range theory: transferable propositions about what works for whom under what conditions
  • Realist-trained evaluators: this approach requires specialist knowledge to implement credibly

Best Practices

Articulate mechanisms explicitly. The most common failure in realist evaluation is treating mechanisms as black boxes. A mechanism statement must name the response that is triggered: "Women participate in savings groups (M: social trust and reciprocal obligation) when neighbours they already know are members (C), producing improved financial resilience (O)."

Monitor context throughout implementation. Context changes during implementation, political shifts, market fluctuations, leadership changes. Build context monitoring into the evaluation design.

Use theory to guide data collection, not data to generate theory. Realist evaluation starts deductively with CMO hypotheses and tests them, it is not grounded theory. Starting with data and inductively generating CMOs produces poorly specified findings.

Strengthen plausibility with existing evidence. Before testing CMO configurations empirically, review the literature for evidence that the proposed mechanisms operate in similar contexts.

Report negative cases. CMO configurations that were disconfirmed are as analytically important as those that were confirmed. Report both.

Common Mistakes

Treating "context" as confounders to control away. In realist evaluation, context is not noise, it is explanatory. Controlling for context in a regression model destroys the analytical value of contextual variation.

Listing characteristics instead of specifying mechanisms. Saying "the programme worked in urban contexts" is a contextual observation, not a realist finding. A realist finding explains why, what mechanism urban context activates or enables.

Using realist vocabulary without realist reasoning. Programmes sometimes describe their evaluation as "realist" because they collected qualitative data alongside a survey. Realist evaluation requires explicit CMO hypothesis development, iterative theory refinement, and systematic cross-case comparison.

Designing without sufficient qualitative depth. Mechanisms are not directly observable in outcome data. You need interviews, observations, or documents that reveal how people responded to programme inputs and why. Superficial qualitative data produces superficial mechanism specification.

Claiming generalisability prematurely. Middle-range theories from a single realist evaluation are hypotheses, not laws. Replication across multiple contexts is needed before transferability can be claimed.

Examples

Community health, East Africa. A realist evaluation of a community health worker (CHW) programme in Kenya identified three CMO configurations from the initial programme theory. The primary configuration, that CHWs embedded in community structures (C) would trigger help-seeking behaviour through social trust (M), was confirmed in rural areas where CHWs were elected by their communities but disconfirmed in peri-urban areas where CHWs were centrally assigned. A secondary configuration about maternal health knowledge was confirmed across all contexts. These findings informed a redesign of the CHW selection process for the programme's second phase.

Cash transfers, West Africa. A realist evaluation of a conditional cash transfer programme in Niger found that the same transfer amount produced very different nutritional outcomes across regions. The mechanism analysis revealed that in markets with functioning grain supply chains (C), the cash trigger activated commercial food purchasing (M) and produced dietary diversity improvements (O). In remote areas with thin markets, the mechanism did not fire because cash could not be exchanged for diverse foods. The finding shaped the geographic targeting strategy for scale-up.

Education governance, South Asia. A realist synthesis of 23 evaluations of school governance reform programmes in South Asia identified that reforms producing learning improvements shared one CMO configuration: when local government had prior capacity and community trust (C), school management committee formation (M: shared accountability) produced teacher attendance improvements and learning gains (O). Reforms in low-capacity settings produced the governance structures without activating the accountability mechanism.

Compared To

MethodCausal LogicCounterfactualPrimary Output
Realist EvaluationGenerative (mechanisms)NoneMiddle-range theory
Impact EvaluationSuccessionist (regularity)ExplicitAverage treatment effect
Process TracingMechanism tracingNoneCausal chain evidence
Contribution AnalysisPlausible contributionNoneContribution story
Developmental EvaluationEmergentNoneReal-time learning

Relevant Indicators

18 indicators across DFID, UNDP, and OECD-DAC frameworks. Key examples:

  • Number of CMO configurations initially hypothesised versus confirmed by evaluation data
  • Degree to which evaluation explains outcome variation across implementation contexts (rated 1-5)
  • Proportion of evaluation recommendations that specify the context conditions necessary for replication

Related Tools

  • Evaluation Planner: structure your CMO hypothesis development and data collection plan
  • MEStudio Logic Model Builder: for building the initial programme theory that underpins CMO analysis

Related Topics

  • Process Tracing, a complementary method for tracing causal mechanisms within individual cases
  • Contribution Analysis, an alternative for building causal arguments without experimental design
  • Mixed Methods Evaluation, realist evaluation typically requires mixed methods to test CMO configurations
  • Theory of Change, the programme theory that generates the initial CMO hypotheses
  • Developmental Evaluation, an alternative for highly emergent programmes where CMOs cannot be pre-specified

Further Reading

  • Pawson, R. & Tilley, N. (1997). Realistic Evaluation. London: Sage. The foundational text.
  • Pawson, R. (2006). Evidence-Based Policy: A Realist Perspective. London: Sage. Extends to realist synthesis.
  • Blamey, A. & Mackenzie, M. (2007). "Theories of Change and Realistic Evaluation." Evaluation, 13(4), 439-455. Comparison with other theory-based approaches.
  • Wong, G., Greenhalgh, T., Westhorp, G., & Pawson, R. (2012). "RAMESES Publication Standards: Realist Syntheses." BMC Medicine, 10, 21. Standards for realist synthesis reporting.

At a Glance

Identifies the mechanisms through which a programme works (or does not work) in particular contexts, explaining variation in outcomes across different settings and populations.

Best For

  • Complex programmes operating across diverse contexts
  • Understanding why a programme works in some settings but not others
  • Theory-testing evaluations where the mechanisms are uncertain
  • Informing scale-up decisions by specifying the conditions needed for success

Complexity

Very High

Timeframe

Typically 12-24 months; iterative and theory-driven throughout

Linked Indicators

18 indicators across 3 donor frameworks

DFIDUNDPOECD-DAC

Examples

  • Number of context-mechanism-outcome configurations identified and tested
  • Degree to which evaluation findings explain outcome variation across implementation sites
  • Quality of theory refinement based on empirical testing of CMO configurations

Related Topics

Pillar
Contribution Analysis
A structured approach to building a credible case for how and why a programme contributed to observed outcomes, without requiring experimental attribution.
Pillar
Process Tracing
A within-case method for causal inference that tests whether the causal mechanisms predicted by a theory of change actually operated in a specific case, using systematic evidence to evaluate causal claims.
Pillar
Theory of Change
A structured explanation of how and why a set of activities is expected to lead to desired outcomes, mapping the causal logic from inputs to impact.
Core Concept
Mixed Methods Evaluation
An evaluation approach that systematically combines quantitative and qualitative data to provide a more complete understanding of programme effects, mechanisms, and context.
Pillar
Developmental Evaluation
An evaluation approach designed for complex, adaptive programmes in which goals and processes are emergent, and the evaluator works alongside the programme team as an embedded learning partner.
Pillar
Participatory Evaluation
An evaluation approach that actively involves stakeholders and beneficiaries throughout all stages, from design through use of findings, ensuring local ownership and relevance.
Pillar
Impact Evaluation
A rigorous evaluation approach that measures the causal effect of a programme on outcomes by comparing what happened with what would have happened in its absence.