Skip to main content
M&E Studio
Home
Services
Tools
AI for M&E
Workflows
Plugins
Prompts
Start a Conversation
Library
Contribution AnalysisDevelopmental EvaluationImpact EvaluationLogframe / Logical FrameworkMost Significant ChangeOutcome HarvestingOutcome MappingParticipatory EvaluationProcess TracingQuasi-Experimental DesignRealist EvaluationResults FrameworkResults-Based ManagementTheory of ChangeUtilization-Focused Evaluation
M&E Studio

Decision-Grade M&E, Responsibly Built

About

  • About Us
  • Contact
  • LinkedIn

Services

  • Our Services
  • Tools

AI for M&E

  • Workflows
  • Plugins
  • Prompts
  • AI Course

M&E Library

  • Decision Guides
  • Indicators
  • Reference
  • Downloads

Legal

  • Terms
  • Privacy
  • Accessibility

© 2026 Logic Lab LLC. All rights reserved.

  1. M&E Library
  2. /
  3. Validity (Internal & External)
TermMethods3 min read

Validity (Internal & External)

The degree to which an evaluation accurately demonstrates causal relationships (internal validity) and generalizes findings beyond the study context (external validity).

Definition

Validity refers to the accuracy and trustworthiness of conclusions drawn from evaluation data. It has two distinct dimensions that practitioners must consider separately:

Internal validity asks: Did the programme actually cause the observed outcomes? This is about establishing credible causal inference, ruling out alternative explanations like selection bias, maturation, or external events that could have produced the same results. High internal validity means you can confidently attribute change to your intervention rather than confounding factors.

External validity asks: Can these findings be generalized beyond this specific study? This concerns the applicability of results to other contexts, populations, or time periods. A study with strong external validity produces insights that remain useful even when programme conditions differ from the evaluation setting.

These dimensions often trade off against each other, tightly controlled studies maximize internal validity but may limit generalizability, while real-world implementations offer richer contextual insights at the cost of causal clarity.

Why It Matters

Validity is the foundation of credible M&E. Without it, you cannot distinguish programme success from coincidence, nor can you learn lessons that apply beyond your specific case. Practitioners face validity concerns whenever they make causal claims, "our training improved skills" or "the intervention reduced dropout rates", and these claims drive funding decisions, programme adaptations, and organizational learning.

Poor validity leads to costly mistakes: scaling programmes that don't work, abandoning interventions that do, or misallocating resources based on spurious correlations. Conversely, explicit attention to validity strengthens evaluation design, clarifies what can reasonably be claimed, and builds stakeholder confidence in findings. For impact evaluations and quasi-experimental designs, validity is the primary quality criterion, without it, the evaluation cannot fulfill its purpose.

In Practice

Threats to internal validity include:

  • Selection bias: comparison groups differ systematically before the intervention
  • History: external events coinciding with the programme influence outcomes
  • Maturation: natural changes over time mistaken for programme effects
  • Testing effects: pre-test exposure influences post-test responses
  • Instrumentation: measurement changes over time create artificial effects

Addressing these requires careful design: randomization (when feasible), matched comparison groups, pre-post measurements, and statistical controls for confounders.

Threats to external validity include:

  • Sample representativeness: study participants differ from target population
  • Contextual specificity: results depend on unique local conditions
  • Temporal limitations: findings apply only to specific time periods
  • Implementation fidelity: programme delivered differently than intended

Strengthening external validity involves purposive sampling, documenting contextual conditions, testing across multiple sites, and being explicit about boundary conditions for generalization.

In impact evaluations (P15), internal validity is paramount, the study must establish causality before asking whether it generalizes. In quasi-experimental designs (P14), practitioners use techniques like propensity score matching or difference-in-differences to approximate randomization and strengthen causal claims. Throughout, data quality assessment ensures measurement reliability supports validity, unreliable data cannot be valid.

Related Topics

  • Reliability, measurement consistency, a prerequisite for validity
  • Quasi-Experimental Design, methods for establishing causal inference
  • Impact Evaluation, where validity is the primary concern
  • Data Quality Assessment, ensuring measurement accuracy
  • Bias, systematic errors threatening validity
  • Counterfactual, the comparison needed for causal claims

Further Reading

  • Shadish, Cook, & Campbell (2002). Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton Mifflin. The definitive text on validity threats and design solutions.
  • OECD-DAC (2019). Guidance on the Use of the DAC Criteria for Evaluation. Covers validity considerations within evaluation quality standards.
  • USAID (2020). Evaluation Policy and Guidance. Includes validity requirements for impact evaluations.

At a Glance

Assesses whether an evaluation's findings accurately reflect causal relationships and can be generalized to other contexts.

Best For

  • Designing impact evaluations and quasi-experimental studies
  • Critiquing evaluation reports and methodology choices
  • Planning data collection to minimize threats to validity
  • Interpreting causal claims from M&E findings

Complexity

Medium

Timeframe

Considered throughout study design and analysis

Linked Indicators

12 indicators across 3 donor frameworks

USAIDWorld BankOECD-DAC

Examples

  • Proportion of impact evaluations with documented internal validity threats addressed
  • Degree to which study findings are applicable to programme's target population
  • Use of valid comparison groups in causal inference

Related Topics

Pillar
Quasi-Experimental Design
A family of evaluation designs that estimate causal programme effects without random assignment, using statistical methods to construct credible comparison groups.
Pillar
Impact Evaluation
A rigorous evaluation approach that measures the causal effect of a programme on outcomes by comparing what happened with what would have happened in its absence.
Core Concept
Data Quality Assurance
A systematic process for verifying that collected data meets five quality dimensions, Validity, Integrity, Precision, Reliability, and Timeliness, ensuring data is fit for decision-making.
Term
Reliability
The consistency and repeatability of a measurement, whether the same tool produces stable results across repeated applications, different raters, or different time periods.
Term
Bias
Systematic error in data collection, analysis, or interpretation that distorts results and threatens the validity of M&E findings.
Term
Counterfactual
The comparison between what happened and what would have happened in the absence of an intervention, the fundamental basis for establishing causal attribution in impact evaluation.