Library

Realist Evaluation

An evaluation approach that asks what works, for whom, in what circumstances, and why, by identifying the mechanisms through which programs produce outcomes in specific contexts.

When to Use

Realist evaluation is the right approach when the question is not simply "did the program work?" but "for whom did it work, under what conditions, and through what mechanisms?" Developed by Ray Pawson and Nick Tilley in the 1990s, realist evaluation is built on the insight that programs do not cause outcomes directly - they introduce resources and opportunities that trigger responses in specific people in specific contexts.

Use it when:

Outcomes vary across sites or populations: the program shows strong results in some places and weak results in others, and you need to understand why
Context is central: the program works through relationships, norms, or institutional conditions that differ meaningfully across settings
Theory refinement is the goal: you want to understand why a program works in order to improve it, not just whether it works on average
Scale-up decisions require specificity: before expanding a program, funders and managers need to know which contexts are necessary for the mechanisms to fire
Existing evidence is mixed: realist synthesis (the literature-based version) can reconcile conflicting findings from multiple evaluations of similar interventions

Realist evaluation is resource-intensive and produces probabilistic, context-specific findings rather than average treatment effects. It is not suitable when funders need a single yes/no effectiveness verdict, when resources are limited, or when the program theory is very simple and context is relatively uniform.

Scenario	Use Realist Evaluation?	Better Alternative
Why does it work for some and not others?	Yes	-
Average effect across all contexts	No	Impact Evaluation
Simple, uniform intervention	No	RCT or QED
Building causal argument without mechanism	No	Contribution Analysis
Scale-up context specification	Yes	-
Literature synthesis of mixed evidence	Yes (realist synthesis)	-

How It Works

Realist evaluation is built around one central analytical unit: the Context-Mechanism-Outcome (CMO) configuration. A CMO configuration states: in this context (C), this mechanism (M) is triggered, producing this outcome (O).

Context: the conditions (social, institutional, cultural, geographic, historical) within which a program operates. Context is not just background; it activates or suppresses mechanisms
Mechanism: the causal process that connects a program resource or activity to an outcome. Mechanisms are typically hidden - they involve how people reason and respond to program inputs
Outcome: the observable change that results when a mechanism fires in a given context

Step 1: Develop initial program theory (IPT)

Start with an explicit theory of how the program is supposed to work. This is not just a logic model - it must articulate the mechanisms through which resources are expected to change behavior.

Step 2: Generate CMO hypotheses

Translate the program theory into a set of testable CMO configurations. For example: "When community health workers are respected figures in their community (C), free bednet provision triggers social norm activation around child protection (M), producing improved consistent bednet use (O)."

Step 3: Collect data to test the CMOs

Mixed methods are typically required. Quantitative data can test whether outcomes varied by context. Qualitative data (interviews, observations) can probe the mechanisms.

Step 4: Analyze CMO configurations

Examine which CMO configurations were confirmed, partially confirmed, or disconfirmed by the data. Where mechanisms did not fire as expected, identify what contextual factor suppressed them.

Step 5: Refine the program theory

Revise the initial program theory based on empirical findings. Realist evaluation is iterative - the theory improves with each cycle of hypothesis testing.

Step 6: Produce middle-range theory

Synthesise findings into transferable, middle-range theories that specify the conditions under which this type of intervention produces these types of outcomes. These are more useful for decision-making than context-specific findings alone.

Key Components

Initial program theory: explicit causal logic articulating mechanisms, not just input-output chains
CMO configurations: testable hypotheses linking context, mechanism, and outcome
Context mapping: systematic documentation of the contextual factors relevant to mechanism activation
Mixed methods data collection: quantitative to test outcome variation by context; qualitative to probe mechanisms
Iterative theory refinement: repeated cycles of hypothesis testing and theory revision
Middle-range theory: transferable propositions about what works for whom under what conditions
Realist-trained evaluators: this approach requires specialist knowledge to implement credibly

Best Practices

Articulate mechanisms explicitly. The most common failure in realist evaluation is treating mechanisms as black boxes. A mechanism statement must name the response that is triggered: "Women participate in savings groups (M: social trust and reciprocal obligation) when neighbours they already know are members (C), producing improved financial resilience (O)."

Monitor context throughout implementation. Context changes during implementation - political shifts, market fluctuations, leadership changes. Build context monitoring into the evaluation design.

Use theory to guide data collection, not data to generate theory. Realist evaluation starts deductively with CMO hypotheses and tests them - it is not grounded theory. Starting with data and inductively generating CMOs produces poorly specified findings.

Strengthen plausibility with existing evidence. Before testing CMO configurations empirically, review the literature for evidence that the proposed mechanisms operate in similar contexts.

Report negative cases. CMO configurations that were disconfirmed are as analytically important as those that were confirmed. Report both.

Common Mistakes

Treating "context" as confounders to control away. In realist evaluation, context is not noise - it is explanatory. Controlling for context in a regression model destroys the analytical value of contextual variation.

Listing characteristics instead of specifying mechanisms. Saying "the program worked in urban contexts" is a contextual observation, not a realist finding. A realist finding explains why - what mechanism urban context activates or enables.

Using realist vocabulary without realist reasoning. Programs sometimes describe their evaluation as "realist" because they collected qualitative data alongside a survey. Realist evaluation requires explicit CMO hypothesis development, iterative theory refinement, and systematic cross-case comparison.

Designing without sufficient qualitative depth. Mechanisms are not directly observable in outcome data. You need interviews, observations, or documents that reveal how people responded to program inputs and why. Superficial qualitative data produces superficial mechanism specification.

Claiming generalisability prematurely. Middle-range theories from a single realist evaluation are hypotheses, not laws. Replication across multiple contexts is needed before transferability can be claimed.

Examples

Community health, East Africa. A realist evaluation of a community health worker (CHW) program in Kenya identified three CMO configurations from the initial program theory. The primary configuration - that CHWs embedded in community structures (C) would trigger help-seeking behavior through social trust (M) - was confirmed in rural areas where CHWs were elected by their communities but disconfirmed in peri-urban areas where CHWs were centrally assigned. A secondary configuration about maternal health knowledge was confirmed across all contexts. These findings informed a redesign of the CHW selection process for the program's second phase.

Cash transfers, West Africa. A realist evaluation of a conditional cash transfer program in Niger found that the same transfer amount produced very different nutritional outcomes across regions. The mechanism analysis revealed that in markets with functioning grain supply chains (C), the cash trigger activated commercial food purchasing (M) and produced dietary diversity improvements (O). In remote areas with thin markets, the mechanism did not fire because cash could not be exchanged for diverse foods. The finding shaped the geographic targeting strategy for scale-up.

Education governance, South Asia. A realist synthesis of 23 evaluations of school governance reform programs in South Asia identified that reforms producing learning improvements shared one CMO configuration: when local government had prior capacity and community trust (C), school management committee formation (M: shared accountability) produced teacher attendance improvements and learning gains (O). Reforms in low-capacity settings produced the governance structures without activating the accountability mechanism.

Compared To

Method	Causal Logic	Counterfactual	Primary Output
Realist Evaluation	Generative (mechanisms)	None	Middle-range theory
Impact Evaluation	Successionist (regularity)	Explicit	Average treatment effect
Process Tracing	Mechanism tracing	None	Causal chain evidence
Contribution Analysis	Plausible contribution	None	Contribution story
Developmental Evaluation	Emergent	None	Real-time learning

Relevant Indicators

18 indicators across DFID, UNDP, and OECD-DAC frameworks. Key examples:

Number of CMO configurations initially hypothesised versus confirmed by evaluation data
Degree to which evaluation explains outcome variation across implementation contexts (rated 1-5)
Proportion of evaluation recommendations that specify the context conditions necessary for replication

Related Tools

Evaluation Planner: structure your CMO hypothesis development and data collection plan
MEStudio Logic Model Builder: for building the initial program theory that underpins CMO analysis

Realist Evaluation

An evaluation approach that asks what works, for whom, in what circumstances, and why, by identifying the mechanisms through which programs produce outcomes in specific contexts.

When to Use

Use it when:

Outcomes vary across sites or populations: the program shows strong results in some places and weak results in others, and you need to understand why
Context is central: the program works through relationships, norms, or institutional conditions that differ meaningfully across settings
Theory refinement is the goal: you want to understand why a program works in order to improve it, not just whether it works on average
Scale-up decisions require specificity: before expanding a program, funders and managers need to know which contexts are necessary for the mechanisms to fire
Existing evidence is mixed: realist synthesis (the literature-based version) can reconcile conflicting findings from multiple evaluations of similar interventions

Scenario	Use Realist Evaluation?	Better Alternative
Why does it work for some and not others?	Yes	-
Average effect across all contexts	No	Impact Evaluation
Simple, uniform intervention	No	RCT or QED
Building causal argument without mechanism	No	Contribution Analysis
Scale-up context specification	Yes	-
Literature synthesis of mixed evidence	Yes (realist synthesis)	-

How It Works

Context: the conditions (social, institutional, cultural, geographic, historical) within which a program operates. Context is not just background; it activates or suppresses mechanisms
Mechanism: the causal process that connects a program resource or activity to an outcome. Mechanisms are typically hidden - they involve how people reason and respond to program inputs
Outcome: the observable change that results when a mechanism fires in a given context

Step 1: Develop initial program theory (IPT)

Start with an explicit theory of how the program is supposed to work. This is not just a logic model - it must articulate the mechanisms through which resources are expected to change behavior.

Step 2: Generate CMO hypotheses

Step 3: Collect data to test the CMOs

Mixed methods are typically required. Quantitative data can test whether outcomes varied by context. Qualitative data (interviews, observations) can probe the mechanisms.

Step 4: Analyze CMO configurations

Examine which CMO configurations were confirmed, partially confirmed, or disconfirmed by the data. Where mechanisms did not fire as expected, identify what contextual factor suppressed them.

Step 5: Refine the program theory

Revise the initial program theory based on empirical findings. Realist evaluation is iterative - the theory improves with each cycle of hypothesis testing.

Step 6: Produce middle-range theory

Key Components

Initial program theory: explicit causal logic articulating mechanisms, not just input-output chains
CMO configurations: testable hypotheses linking context, mechanism, and outcome
Context mapping: systematic documentation of the contextual factors relevant to mechanism activation
Mixed methods data collection: quantitative to test outcome variation by context; qualitative to probe mechanisms
Iterative theory refinement: repeated cycles of hypothesis testing and theory revision
Middle-range theory: transferable propositions about what works for whom under what conditions
Realist-trained evaluators: this approach requires specialist knowledge to implement credibly

Best Practices

Monitor context throughout implementation. Context changes during implementation - political shifts, market fluctuations, leadership changes. Build context monitoring into the evaluation design.

Strengthen plausibility with existing evidence. Before testing CMO configurations empirically, review the literature for evidence that the proposed mechanisms operate in similar contexts.

Report negative cases. CMO configurations that were disconfirmed are as analytically important as those that were confirmed. Report both.

Common Mistakes

Examples

Compared To

Method	Causal Logic	Counterfactual	Primary Output
Realist Evaluation	Generative (mechanisms)	None	Middle-range theory
Impact Evaluation	Successionist (regularity)	Explicit	Average treatment effect
Process Tracing	Mechanism tracing	None	Causal chain evidence
Contribution Analysis	Plausible contribution	None	Contribution story
Developmental Evaluation	Emergent	None	Real-time learning

Relevant Indicators

18 indicators across DFID, UNDP, and OECD-DAC frameworks. Key examples:

Number of CMO configurations initially hypothesised versus confirmed by evaluation data
Degree to which evaluation explains outcome variation across implementation contexts (rated 1-5)
Proportion of evaluation recommendations that specify the context conditions necessary for replication

Related Tools

Evaluation Planner: structure your CMO hypothesis development and data collection plan
MEStudio Logic Model Builder: for building the initial program theory that underpins CMO analysis

Realist Evaluation

When to Use

How It Works

Key Components

Best Practices

Common Mistakes

Examples

Compared To

Relevant Indicators

Related Tools

Related Topics

Realist Evaluation

When to Use

How It Works

Key Components

Best Practices

Common Mistakes

Examples

Compared To

Relevant Indicators

Related Tools

Related Topics