Skip to main content
M&E Studio
Home
AI for M&E
GuidesPromptsPlugins
Resources
Indicator LibraryReference LibraryTopic GuidesTools
Services
About
ENFRES
M&E Studio

AI for M&E, Built for Practitioners

About

  • About Us
  • Contact
  • Insights
  • LinkedIn

Services

  • Our Services

AI for M&E

  • Guides
  • Prompts
  • Plugins
  • Insights

Resources

  • Indicator Library
  • Reference Library
  • Downloads
  • Tools

Legal

  • Terms
  • Privacy
  • Accessibility

© 2026 Logic Lab LLC. All rights reserved.

  1. M&E Library
  2. /
  3. Decision Guides
  4. /
  5. RCT vs Quasi-Experimental Design

RCT vs Quasi-Experimental Design

When to use a randomized controlled trial vs a quasi-experimental design. Feasibility, cost, rigor, and what each can actually tell you about your program's impact.

At a Glance

FactorRCTQuasi-Experimental
Causal evidenceVery strong (gold standard)Moderate to strong
Requires randomizationYesNo
Comparison groupRandomly assigned controlMatched or naturally occurring
Typical cost$100K-500K+$30K-150K
Timeline2-5 years1-3 years
Statistical expertiseHighHigh
Best forStandardized, simple interventionsPrograms where randomization was not possible
Handles complexityPoorlyBetter (more flexible designs)
Donor acceptanceUniversally acceptedWidely accepted for impact evidence

Both designs try to answer the same question: did the program cause the change? The difference is how they construct the counterfactual. An RCT creates it through random assignment. A quasi-experimental design approximates it through statistical methods and naturally occurring comparison groups.

When an RCT Is Feasible

An RCT requires specific conditions. If any of these are missing, stop and look at quasi-experimental alternatives.

You can randomize. The program has not yet reached everyone. A phased rollout, lottery-based selection, or resource constraint creates a natural opportunity to randomize who receives the program first.

The intervention is standardized. Everyone in the treatment group gets roughly the same thing. If the program adapts significantly by site, an RCT measures the average of many different treatments, which is often not useful.

The sample is large enough. Statistical power calculations tell you how many units (individuals, schools, villages) you need. Most cluster-randomized trials need 30+ clusters per arm. If you have 8 districts, an RCT will not produce meaningful results. See How to Choose Sample Size for the math.

Ethics allow it. Withholding a proven, life-saving intervention from a control group is not ethical. Phased rollout designs, where the control group receives the program later, address this. But if the intervention must reach everyone immediately, randomization is off the table.

The budget supports it. Evaluation costs alone typically run $100K-500K+. That does not include program implementation costs. If your total program budget is $500K, spending half on evaluation makes no sense.

When a Quasi-Experimental Design Fits Better

Most development evaluations land here. The program already started, randomization was not planned, but comparison data exists. That is the normal situation.

The program was not randomized at the start. This is the most common scenario. The program targeted specific areas based on need, political decisions, or partner capacity. You cannot undo that selection, but you can account for it statistically.

A natural comparison group exists. Non-program areas, people who were eligible but did not participate, or communities on a waiting list. The comparison does not need to be perfect. It needs to be plausible after adjustments.

Baseline data was collected. Most quasi-experimental designs require pre-program data. If you only have endline data, your options narrow significantly.

Budget is moderate. $30K-150K covers most quasi-experimental evaluations, including primary data collection if needed.

The Four Main QED Approaches

Difference-in-Differences (DID)

Compare the change over time in program areas versus comparison areas. If stunting dropped 5 percentage points in program areas but only 1 point in comparison areas, the estimated program effect is 4 points.

What you need: Baseline and endline data for both groups. At minimum two time points, though more is better.

Key assumption: Both groups would have followed the same trend without the program (parallel trends). Check this by comparing pre-program trends if you have the data.

When it works best: Programs that target geographic areas, where routine data exists in both program and non-program sites.

Propensity Score Matching (PSM)

Match each participant with a non-participant who looks statistically similar on observable characteristics (age, income, location, education). Compare outcomes between matched pairs.

What you need: Rich data on characteristics that predict program participation. The more variables, the better the match.

Key assumption: All the factors that determine who participates are captured in your data. If unobserved factors (motivation, political connections) drive participation, PSM cannot fix the bias.

When it works best: Individual-level programs (training, cash transfers) where you have survey data on both participants and non-participants.

Regression Discontinuity (RD)

When eligibility depends on a score or threshold (income below a cutoff, test scores above a line), compare people just above and just below. Those near the cutoff are essentially similar, creating a natural experiment.

What you need: A clear eligibility threshold and data on the running variable (the score that determines eligibility).

Key limitation: Results only apply to people near the cutoff, not the entire population. If your program targets the poorest 20%, RD tells you about the effect for people around the 20th percentile, not for the poorest 5%.

When it works best: Targeted programs with score-based eligibility. Check whether your program uses any kind of ranking or threshold before defaulting to other designs.

Interrupted Time Series (ITS)

Analyze trends in an outcome before and after the program starts, using many pre-program data points to establish what the trend would have looked like without the program.

What you need: At least 8-10 data points before the intervention. Monthly health facility data, quarterly education statistics, or annual survey rounds.

Key assumption: Nothing else changed at the same time as the program that could explain the shift in trend. If a new national policy launched the same month, ITS cannot separate the two effects.

When it works best: Programs with strong routine monitoring data but no comparison group. Health system interventions are a common application because facility data often has long time series.

Cost Comparison

ComponentRCTQuasi-Experimental
Design and protocol$15K-40K$8K-20K
Baseline data collection$30K-150K$15K-60K (often uses existing data)
Endline data collection$30K-150K$15K-60K
Analysis$15K-40K$10K-30K
Midline (if included)$20K-80K$10K-40K
IRB/ethical review$2K-10K$2K-5K
Total range$100K-500K+$30K-150K

The biggest cost driver is primary data collection. If a quasi-experimental design can use existing administrative or routine monitoring data, costs drop dramatically. DID using health facility records might cost $30K-50K total. The same question answered with an RCT requiring household surveys could cost $200K+.

Common Ways Each Goes Wrong

RCT Failures

Contamination. The control group gets access to the program (or something similar) from another source. Your treatment-control contrast collapses.

Attrition. People drop out of the study at different rates in treatment and control groups. The remaining sample is no longer comparable.

Underpowered. The sample was too small to detect the expected effect. You finish the study and find "no significant effect," but the real problem is you could not detect an effect even if one existed.

Hawthorne effects. People change behavior because they know they are being studied, not because of the program.

QED Failures

Bad comparison group. The comparison group differs from the program group in ways your statistical model does not capture. The results look like a program effect but are actually a selection effect.

Parallel trends violated. In DID, if the comparison group was already on a different trajectory before the program, the estimated effect is biased. Always plot pre-program trends for both groups.

Overfitting in PSM. Matching on too many variables with a small sample produces matches that look good statistically but are meaningless practically.

Confounding events in ITS. A policy change, economic shock, or other program launches at the same time as your intervention. ITS cannot separate the effects.

Decision Guide

Work through these questions in order.

1. Can you randomize?

  • Yes, ethically and practically: Consider an RCT. But check that your sample size is sufficient and your budget supports it.
  • No: Move to quasi-experimental options.

2. Do you have baseline data?

  • Yes, for both program and comparison areas: DID is your strongest option.
  • Yes, with a score-based eligibility cutoff: Check if regression discontinuity works.
  • Yes, with many pre-program time points but no comparison group: Consider ITS.
  • No baseline data: PSM with endline data only (weaker), or switch to theory-based approaches.

3. What is your budget?

  • Over $100K and the question demands causal attribution: RCT or strong QED with primary data collection.
  • $30K-100K: QED using existing data where possible. DID with routine data is often the best value.
  • Under $30K: Do not attempt either. Use contribution analysis or other theory-based approaches. See How to Choose Evaluation Methodology.

4. How standardized is the program?

  • Same intervention everywhere: Either design works.
  • Varies significantly by site: QED handles variation better. An RCT measures the average effect across variations, which may not be useful for any specific site.

Use the Evaluation Designer to structure your design once you have made the choice, or the Method Selector to explore alternatives if none of these fit.

Common Mistakes

Mistake 1: Treating "quasi-experimental" as "RCT lite." QED is not a weaker version of an RCT. It is a different family of designs suited to different conditions. A well-executed DID can produce highly credible evidence. A poorly executed RCT with contamination and attrition produces garbage.

Mistake 2: Choosing DID without checking parallel trends. DID requires that the treatment and comparison groups were following the same trajectory before the program. If you cannot show this with data, your DID estimate is unreliable. Plot pre-program trends for both groups. If they diverge, DID is not your design.

Mistake 3: Defaulting to an RCT because the donor asked for "rigorous evidence." Rigorous evidence is not synonymous with RCT. Most donors accept well-designed quasi-experimental evaluations. Ask the donor what they actually need. "Credible evidence of impact" can come from DID or PSM, not only from randomization.

Mistake 4: Ignoring the design effect in cluster-randomized trials. If you randomize at the village or school level but measure individuals, you need far more units than individual-level randomization suggests. A 200-person sample might require 40+ clusters. See How to Choose Sample Size.

Mistake 5: Running a QED with a bad comparison group and calling it rigorous. A comparison group that differs systematically from the treatment group in ways your model does not capture is worse than no comparison group at all. It gives you a precise but biased estimate. If you cannot find a credible comparison, use theory-based methods instead of forcing a bad QED.

Frequently Asked Questions

PreviousCluster Sampling vs Stratified SamplingNextThe DAC Evaluation Criteria Explained