Skip to main content
M&E Studio
Home
Services
Tools
AI for M&E
Workflows
Plugins
Prompts
Start a Conversation
Library
Contribution AnalysisDevelopmental EvaluationImpact EvaluationLogframe / Logical FrameworkMost Significant ChangeOutcome HarvestingOutcome MappingParticipatory EvaluationProcess TracingQuasi-Experimental DesignRealist EvaluationResults FrameworkResults-Based ManagementTheory of ChangeUtilization-Focused Evaluation
M&E Studio

Decision-Grade M&E, Responsibly Built

About

  • About Us
  • Contact
  • LinkedIn

Services

  • Our Services
  • Tools

AI for M&E

  • Workflows
  • Plugins
  • Prompts
  • AI Course

M&E Library

  • Decision Guides
  • Indicators
  • Reference
  • Downloads

Legal

  • Terms
  • Privacy
  • Accessibility

© 2026 Logic Lab LLC. All rights reserved.

  1. M&E Library
  2. /
  3. Statistical Significance
TermMethods3 min read

Statistical Significance

A statistical measure indicating whether observed results are likely due to a real effect rather than random chance, typically assessed using p-values and hypothesis testing.

Definition

Statistical significance is a formal statistical concept used to determine whether observed results, such as differences between treatment and control groups, are likely to reflect a real effect rather than random chance. In M&E, it answers the question: "Could this result have occurred by random variation alone?"

The most common measure is the p-value, which quantifies the probability of observing results at least as extreme as those obtained, assuming no true effect exists (the null hypothesis). A p-value below a predetermined threshold (typically 0.05 or 5%) indicates statistical significance, meaning there's less than a 5% probability the result occurred by chance. However, statistical significance does not measure the size or practical importance of an effect; that requires examining effect size separately.

Why It Matters

Statistical significance is essential for credible impact evaluation and evidence-based decision-making. Without it, practitioners cannot distinguish between genuine programme effects and random fluctuations in the data. This is particularly critical when:

  • Making attribution claims: determining whether observed outcomes can reasonably be attributed to the programme rather than external factors or chance
  • Scaling interventions: deciding whether to expand a programme based on evaluation results that may reflect random variation
  • Reporting to donors: providing defensible evidence of impact that meets methodological standards
  • Avoiding false positives: preventing investment in ineffective programmes that appeared successful due to random chance

However, statistical significance alone is insufficient. A result can be statistically significant yet practically meaningless (tiny effect with large sample), or practically important yet not statistically significant (large effect with small sample). Practitioners must examine both statistical significance and effect size to fully interpret evaluation findings.

In Practice

Statistical significance appears primarily in quantitative impact evaluations and quasi-experimental designs. Common applications include:

Impact evaluations using randomized controlled trials (RCTs) or quasi-experimental designs calculate p-values for each outcome indicator to test whether treatment and control groups differ significantly. For example, a health programme might find that vaccination rates are 15 percentage points higher in the treatment group (p=0.02), indicating this difference is unlikely due to chance.

Survey analysis uses significance testing to determine whether observed differences across demographic groups (disaggregation) reflect real patterns or sampling variation. This validates whether outcome disparities by gender, location, or other characteristics are genuine.

Before-after comparisons test whether changes from baseline to endline are statistically significant, accounting for natural variation in the data.

Best practice requires reporting both p-values and effect sizes (e.g., Cohen's d, odds ratios) alongside confidence intervals. A result showing p=0.049 should not be treated as meaningfully different from p=0.051, the arbitrary 0.05 threshold creates a false binary. Instead, interpret the full statistical picture: effect magnitude, precision (confidence intervals), and practical relevance to programme goals.

Related Topics

  • Quasi-Experimental Design, designs that enable causal inference and significance testing
  • Impact Evaluation, rigorous methods where significance testing is standard
  • Effect Size, measures practical importance beyond statistical significance
  • Hypothesis Testing, the formal framework for significance testing
  • P-Values, the primary metric for statistical significance
  • Power Analysis, ensures adequate sample size to detect significant effects

Links to: P14 (quasi-experimental-design), P15 (impact-evaluation), effect-size, hypothesis-testing, p-values, power-analysis

At a Glance

Determines whether observed programme effects are real or likely due to random variation

Best For

  • Interpreting results from impact evaluations and experimental designs
  • Assessing whether observed differences between groups are meaningful
  • Validating that programme effects exceed what could occur by chance

Complexity

Medium

Timeframe

Calculated during data analysis phase

Related Topics

Pillar
Quasi-Experimental Design
A family of evaluation designs that estimate causal programme effects without random assignment, using statistical methods to construct credible comparison groups.
Pillar
Impact Evaluation
A rigorous evaluation approach that measures the causal effect of a programme on outcomes by comparing what happened with what would have happened in its absence.