When to Use
Quasi-experimental designs (QEDs) sit between experimental designs (RCTs) and purely descriptive evaluations. They attempt to answer "Did the programme cause this change?" without random assignment. Use them when:
- Random assignment is not feasible: ethical concerns, operational constraints, or political resistance prevent randomisation, but causal attribution is still needed
- A natural comparison group exists: programme eligibility rules, phase-in schedules, or geographic boundaries create groups that differ only in programme exposure
- Administrative data is available: government registers, health records, or school enrolment data allow retrospective matching and comparison
- A natural experiment occurred: a policy change, eligibility threshold, or external shock creates a quasi-random variation in programme exposure that can be exploited
- Donors require attribution evidence: USAID, USDA, and the World Bank accept credible quasi-experimental designs as evidence of programme effectiveness
QEDs are not appropriate when no credible comparison group can be constructed, when the design assumptions cannot be tested or defended, or when process questions (why and how) are more important than causal attribution (use contribution analysis or process tracing in those cases).
| Scenario | Use QED? | Better Alternative |
|---|---|---|
| Ethical or logistical barrier to RCT | Yes | — |
| Natural eligibility threshold exists | Yes (regression discontinuity) | — |
| Phase-in rollout possible | Yes (difference-in-differences) | — |
| No comparison group feasible | No | Contribution Analysis |
| Process questions are primary | No | Process Tracing |
| Donor requires gold-standard evidence | No | RCT |
How It Works
There is no single quasi-experimental design, QED is a family of approaches, each suited to different data situations and assumptions. The four main designs are:
Design 1: Difference-in-Differences (DiD)
Compare the change in outcomes over time in a treatment group against the change in a comparison group that did not receive the programme. The DiD estimate is the "double difference": (treatment post − treatment pre) minus (comparison post − comparison pre). Key assumption: in the absence of the programme, both groups would have experienced similar trends ("parallel trends"). Requires panel data on both groups at baseline and follow-up.
Design 2: Propensity Score Matching (PSM)
Match each programme participant to one or more non-participants who are statistically similar on observed characteristics. Compare outcomes between matched pairs. The PSM estimate is the "average treatment effect on the treated" (ATT). Key assumption: all variables that determine both programme participation and outcomes are observable and included in the matching model.
To implement PSM: collect baseline data on a wide range of characteristics for both participants and non-participants; estimate a logistic regression model predicting programme participation; use the predicted probabilities (propensity scores) to match participants and non-participants; verify balance; compare outcomes.
Design 3: Regression Discontinuity (RD)
Exploit a threshold in a continuous eligibility criterion to compare participants just above the threshold (eligible) against those just below (ineligible). The RD estimate applies only to those near the threshold. Key assumption: units cannot precisely manipulate their score to be just above or below the threshold. Requires a large sample near the threshold and a continuous running variable.
Design 4: Interrupted Time Series (ITS)
Analyse a long time series of outcomes before and after programme introduction, controlling for pre-existing trends. Useful when a single policy or programme is introduced at a specific point in time and administrative data provides many pre-intervention time points. Works without a comparison group but is strengthened by including one.
Key Components
- Comparison group: a group not receiving the programme whose outcomes can be compared to participants
- Baseline data on both groups: pre-programme outcome and covariate measurements for treatment and comparison
- Identical or comparable instruments: the same survey tools used for both groups at every data collection point
- Balance testing: statistical tests confirming the treatment and comparison groups are comparable at baseline on observed characteristics
- Design assumption testing: explicit tests of the key identifying assumptions (parallel trends, common support for PSM, threshold manipulation tests for RD)
- Sensitivity analysis: testing whether the treatment effect estimate changes under alternative model specifications
- Additional time-invariant measures: baseline variables not expected to change, included to improve matching quality
Best Practices
Maximise comparability through identical instruments. Treatment and comparison group data must be collected using the same survey instruments, at the same time, by the same (or equivalently trained) enumerators. Any difference in data collection contaminates the comparison.
Test and report balance, not just match. PSM is not complete when matching is done, you must test whether matched groups are actually balanced on key variables and report the results. Unbalanced matched samples indicate the matching model needs revision.
Pre-specify the primary analysis. Document the intended analysis method, covariates, and outcome specification before data collection. This prevents post-hoc model selection that inflates false positive rates.
Include time-invariant variables in matching. Adding variables that are stable over time (e.g. land ownership, ethnicity, household composition at baseline) improves match quality and reduces bias.
Report design limitations honestly. Every QED involves untestable assumptions. A credible evaluation report states these assumptions clearly and explains why they are reasonable given the context.
Common Mistakes
Treating PSM as sufficient without balance testing. Matching by propensity score does not guarantee balance. Always test covariate balance post-matching and re-match if balance is poor.
Ignoring the parallel trends assumption in DiD. Difference-in-differences estimates are invalid if treatment and comparison groups had different pre-programme trends. Test for parallel trends using pre-programme data if available.
Using a geographically proximate comparison group without spillover controls. If comparison group households can observe or interact with treatment households, contamination biases the estimate toward zero.
Claiming the QED is "as good as an RCT." Quasi-experimental designs make additional assumptions that RCTs do not. Clearly state the design and its assumptions; do not oversell the causal warrant.
Retrospective data fishing. Using existing datasets without a pre-specified analysis plan creates opportunities for model selection that produces false positive findings. Pre-register the analysis wherever possible.
Examples
Food security, Latin America. A USDA-funded programme in Honduras used propensity score matching to evaluate impact on household food security scores. Baseline data included 40 variables on household demographics, assets, and agricultural practices for 2,400 treatment and 2,400 comparison households. After matching, standardised mean differences for all 40 variables fell below 0.10, indicating good balance. The DiD estimate at endline showed a 0.6 standard deviation improvement in food security scores among treatment households relative to matched comparisons.
Education, East Africa. A school improvement programme in Kenya used regression discontinuity based on district poverty scores that determined programme eligibility. Schools scoring just below the eligibility threshold (eligible) were compared to schools just above (ineligible). Analysis of national exam score data showed a 3.8 percentage point improvement in pass rates among eligible schools relative to ineligible schools at the threshold, with no evidence of score manipulation near the threshold.
Health, South Asia. A DFID-funded community health programme in Bangladesh used interrupted time series analysis of monthly facility delivery rates across 120 intervention sub-districts, with 60 matched comparison sub-districts serving as the comparison series. The ITS model estimated a 12 percentage point increase in facility delivery rates attributable to the programme, above the pre-existing trend, with the effect sustained over 24 months post-introduction.
Compared To
| Design | Randomisation | Counterfactual | Key Assumption |
|---|---|---|---|
| QED (PSM) | None | Constructed via matching | All confounders observed |
| QED (DiD) | None | Parallel trends | Common trend absent programme |
| QED (RD) | None | Threshold discontinuity | No score manipulation |
| RCT | Random | Direct control group | Randomisation integrity |
| Contribution Analysis | None | None | Plausible causal story |
Relevant Indicators
38 indicators across USAID, World Bank, USDA, and 3ie frameworks. Key examples:
- Standardised mean difference on key baseline variables between treatment and comparison groups (target < 0.10)
- Difference-in-differences treatment effect estimate with 95% confidence interval
- Common support percentage (proportion of treatment group with matched comparison units in PSM)
- Number of pre-programme periods used to test parallel trends assumption
Related Tools
- Evaluation Planner: structure baseline data collection and comparison group selection
- Indicator Library: identify appropriate outcome measures for your evaluation
Related Topics
- Impact Evaluation, the broader category that includes both RCTs and quasi-experimental designs
- Baseline Design, collecting the data that enables quasi-experimental analysis
- Sampling Methods, how to sample treatment and comparison populations
- Statistical Significance, interpreting p-values and confidence intervals in evaluation analysis
- Attribution vs. Contribution, when QED is appropriate versus contribution analysis
Further Reading
- Gertler, P. et al. (2016). Impact Evaluation in Practice. 2nd ed. World Bank. Chapters 5-8 cover quasi-experimental designs with accessible explanations.
- Rosenbaum, P. & Rubin, D. (1983). "The Central Role of the Propensity Score in Observational Studies for Causal Effects." Biometrika, 70(1), 41-55. The foundational PSM paper.
- Imbens, G. & Lemieux, T. (2008). "Regression Discontinuity Designs: A Guide to Practice." Journal of Econometrics, 142(2), 615-635. The standard RD reference.
- 3ie (2012). Quasi-Experimental Designs for Development Evaluations. Impact Evaluation Series. Practical guidance for development practitioners.