Skip to main content
M&E Studio
Home
AI for M&E
GuidesWorkflow GuidesPromptsPlugins
Resources
Indicator LibraryReference LibraryM&E Method GuidesDecision GuidesTools
Services
About
ENFRES
M&E Studio

AI for M&E, Built for Practitioners

About

  • About Us
  • Contact
  • Insights
  • LinkedIn

Services

  • Our Services

AI for M&E

  • Guides
  • Prompts
  • Plugins
  • Insights

Resources

  • Indicator Library
  • Reference Library
  • Downloads
  • Tools

Legal

  • Terms
  • Privacy
  • Accessibility

© 2026 Logic Lab LLC. All rights reserved.

  1. M&E Library/
  2. Decision Guides/
  3. Common Sampling Mistakes in M&E
M&E Decision Guide

Common Sampling Mistakes in M&E

The eight sampling mistakes that undermine M&E data quality: wrong frames, ignored design effect, field substitution errors, and analysis overclaims.

8
Mistakes covered
3
Phase categories
13
Pre-fieldwork checks
Key Takeaway
Most sampling errors are preventable at the design stage, not fixable in analysis
Frame errors, missing design effect adjustments, and no non-response buffer all happen before a single interview is conducted. Catch them with the pre-fieldwork checklist. Analysis mistakes and overclaiming generalizability are easier to fix in an analysis plan than in a published report.

The Eight Sampling Mistakes

Sampling mistakes fall into three phases: design decisions made before data collection, field execution during data collection, and analysis choices after data collection is complete. Most survey findings that fail external review trace back to one of these eight errors.

#MistakePhaseConsequence
1Wrong or incomplete sampling frameDesignSystematic exclusion of subpopulations
2No design effect adjustmentDesignSample too small, confidence intervals too wide
3No non-response bufferDesignFinal sample below precision requirements
4Enumerator substitution without protocolFieldConvenience bias replaces random selection
5Convenience sample labeled as randomFieldIndefensible findings, credibility loss
6No documentation of selection logicFieldMethodology cannot be verified or replicated
7Cluster data analyzed as simple randomAnalysisConfidence intervals too narrow, overconfidence
8Overclaiming generalizabilityAnalysisFindings exceed what the sample supports

Mistakes 2 and 3 are preventable by running your sample size calculation correctly before fieldwork. The Sampling Calculator handles design effect and non-response buffer automatically: input your population size, required precision, and confidence level, then confirm whether the resulting n is achievable within your budget and timeline.

Mistakes 1, 4, 5, and 6 require protocol design and field supervision. Mistakes 7 and 8 require a data analysis plan written before data collection starts, not after.

Wrong or Incomplete Sampling Frame

The sampling frame is the list or boundary from which you draw your sample. If the frame is incomplete or inaccurate, your sample inherits those errors. No amount of statistical precision corrects a bad frame.

The three most common frame problems in M&E field surveys:

Outdated registration lists: A beneficiary roster from 18 months ago excludes people who joined the program since then and includes people who have left, moved away, or died. The older the list, the higher the exclusion rate. In programs with high population mobility (displaced communities, urban informal settlements, seasonal labor programs), a list more than 6 months old can be badly out of date.

Incomplete geographic coverage: A household registry covering the central program districts but not the remote ones produces a sample that systematically underrepresents the people least likely to report positive outcomes. This is the coverage bias problem, and it tends to inflate performance results.

Wrong unit of selection: If your target population is all households in a catchment area but your frame is a health facility attendance list, you are missing everyone who does not use the facility. The two populations can be very different.

Before you draw the sample, verify three things about your frame: when it was last updated, whether it covers the full target population or only a subset, and whether the unit of selection matches the unit of analysis. If a complete frame does not exist, cluster sampling is the practical alternative. See cluster vs. stratified sampling for when to switch methods.

Ignoring Design Effect and Non-Response

Two calculation errors routinely produce samples that are too small to detect the changes your program is designed to produce.

Design effect: If you use cluster sampling in the field and calculate your sample size as if you were using simple random sampling, you will end up underpowered. People within clusters are more similar to each other than randomly selected individuals would be, which means each within-cluster interview adds less information than a truly independent interview would. The design effect factor (typically 1.5-2.0 for household surveys in development contexts) adjusts for this. If SRS requires 300 completed interviews and your design effect is 1.8, you need 540 interviews, not 300. See cluster sampling for design effect formulas and typical values by context.

Non-response buffer: Survey non-response is predictable. Households are absent, head of household refuses, or the interview is incomplete. If you calculate a required n of 400 and your non-response rate is 15%, you complete 340 interviews. Whether 340 meets your precision requirements depends on the calculation, but you will not know until it is too late to do anything about it. Add a 10-20% buffer to your calculated sample size to account for attrition before you finalize your field plan.

Both errors are entirely preventable at the design stage. They are expensive to correct during fieldwork and often impossible to fix after data collection is complete.

Field Execution Failures

Three field-level mistakes account for most sampling integrity problems in M&E programs.

Enumerator substitution without a documented protocol: When a selected household is unavailable, enumerators need explicit written instructions on what to do next. Without a protocol, they default to the nearest convenient substitute, which is not random substitution; it is convenience sampling attached to a random label. Establish a substitution rule before fieldwork: for example, attempt a callback visit before substituting, and if substituting, select the next unit on the systematic list and document the reason. Whatever the rule is, it must be documented and trained before teams go to the field.

Convenience sampling labeled as random: Enumerators under time pressure or with difficult access routes survey households near the road, households where people are visibly present, or households where the respondent is immediately willing. The resulting sample is biased in ways that are impossible to quantify after the fact. GPS tracking of interview locations and spot-check supervision during fieldwork are the primary controls.

No documentation of selection process: The selection logic must be written down: the sampling frame used, the skip interval or random numbers applied, who conducted the selection, how refusals and absences were handled, and how substitutions were made. Without this record, no external reviewer can verify that the sample was selected as described. This documentation takes less than an hour to produce and protects months of data collection investment. It is also the difference between a methodology that can be replicated at endline and one that cannot.

Analysis and Reporting Errors

Two analysis-stage mistakes appear repeatedly in development sector evaluations.

Analyzing cluster sample data as simple random sample data: This is the most common statistical error in program evaluations. When you use cluster sampling in the field but run standard statistical tests assuming simple random sampling, you underestimate the standard errors on your estimates. Your confidence intervals are too narrow, meaning you are expressing more confidence in your results than the data structure supports. The correction requires specifying the cluster design in your statistical analysis: survey-weighted estimation commands are available in R, Stata, Python (survey package), and SPSS. This takes one line of code if the analysis plan was written correctly before fieldwork. See how to choose sample size for design specifications that feed correctly into analysis.

Overclaiming generalizability: Your sample represents the population it was drawn from, bounded by the geographic scope, time period, and sampling frame used. A survey of beneficiaries in 3 of 12 program districts does not produce findings that generalize to all 12 districts unless you can demonstrate the 3 are representative of the 12, which requires evidence beyond the survey itself. Scope your conclusions explicitly: "among beneficiaries in the three study districts during the monitoring period" rather than "among all program beneficiaries." Reviewers and external evaluators will push back on overclaiming, and the correction after report publication is painful.

Sector Examples

Health: Frame currency failure in East Africa

A district vaccination program ran a coverage survey using a household registry last updated 22 months earlier. Field teams found that 18% of selected households had relocated or dissolved since the registry was compiled. Substitutes were selected by proximity, not by a documented protocol. The coverage estimate of 71% could not be defended in the external evaluation: the combination of frame error and undocumented substitution introduced unmeasurable bias. The external evaluator recommended the survey be repeated with a current frame and a substitution protocol. That cost three times what the original survey cost.

WASH: Design effect miscalculation in West Africa

A WASH program designed a household survey with a calculated n of 280 interviews based on simple random sampling assumptions. Data collection used cluster sampling across 14 villages. The design effect in similar contexts is 1.8. The effective sample size was 156 interviews, not 280. The margin of error was plus or minus 14 percentage points at 95% confidence, versus the 8-point precision the program needed to detect its target improvement. The survey was underpowered and could not confirm whether the program's water point coverage target was met. A pre-fieldwork design effect calculation would have caught this before a single enumerator was hired.

Education: Purposive sample overclaim in South Asia

A school improvement program studied 8 purposively selected high-performing schools and 8 low-performing schools, then reported that "nationally, students in well-resourced schools have 40% higher textbook access." The purposive sample supports a within-study comparison between the 16 selected schools; it cannot support a national generalization. The finding was sound; the reported scope was not. A single sentence change ("among the 16 schools studied") would have preserved the finding and avoided the reviewer pushback.

Livelihoods: Analysis error in Southern Africa

A livelihoods program ran a beneficiary satisfaction survey using cluster sampling across 12 community groups. The analysis team used a standard chi-square test without specifying the clustered design. The resulting p-values were systematically too small, leading the team to report statistically significant differences between subgroups that were not significant once the cluster structure was accounted for. The re-analysis, conducted 3 weeks after the report was shared with the donor, reversed two of the five headline findings.

Common Mistakes

Mistake 1: Using an outdated or incomplete sampling frame. The frame is the list you sample from. If it is 18 months old or covers only a subset of your target population (facility attenders, registered members), your sample inherits those exclusions. The people left out are often the hardest to reach and the least likely to report positive outcomes. Update and verify the frame before drawing your sample.

Mistake 2: Ignoring design effect in cluster sampling. If you calculated your sample size for simple random sampling but are executing cluster sampling in the field, your confidence intervals are too wide and your sample is too small. Apply the design effect multiplier (typically 1.5-2.0) to your required n before fieldwork. Skipping this step is the most common reason household surveys cannot confirm whether program targets were met.

Mistake 3: No substitution protocol for absent households. When a selected household is unavailable, enumerators need written rules for what to do. Without a protocol, they substitute by convenience. This is not random; it is selection bias with a random label. Write the substitution rule before fieldwork begins, train enumerators on it explicitly, and document every substitution made during data collection.

Mistake 4: Analyzing cluster data as simple random sample data. Using standard statistical tests on cluster sample data produces confidence intervals that are too narrow. You are asserting more precision than the data structure supports. Specify the survey design in your analysis package before running any tests. This takes one line of code and prevents findings from being challenged on methodological grounds.

Mistake 5: Generalizing beyond the sample's scope. Your sample represents the population it was drawn from: the districts surveyed, the time period covered, the beneficiaries included in the frame. Reporting findings as applying to all program beneficiaries when you surveyed 3 of 12 districts overclaims what the data can support. Scope your conclusions explicitly in the report.

Pre-Fieldwork Sampling Checklist

Run through this before committing to your data collection budget and field schedule. Each item that you cannot check off represents a risk to your data quality.

Sampling frame:

  • Frame covers the full target population, not a subset (facility attenders, registered members, etc.)
  • Frame was updated within the past 12 months (or verified as current)
  • Unit of selection in the frame matches the unit of analysis in your indicator

Sample size:

  • Design effect applied if using cluster sampling (typical value 1.5-2.0; higher in heterogeneous populations)
  • Non-response buffer included (10-20% depending on context and population accessibility)
  • Subgroup sample sizes checked if subgroup comparisons are a reporting requirement

Field protocol:

  • Substitution protocol documented in writing (when to substitute, how to select the replacement)
  • Enumerators trained on sampling rules, not just questionnaire administration
  • GPS or location verification planned to audit coverage and detect convenience bias

Analysis readiness:

  • Analysis plan specifies survey-weighted estimation if cluster sampling is used
  • Margin of error and confidence level agreed with program management and stakeholders before fieldwork
  • Scope of generalizability defined in writing (which population, which geography, which time period)
  • External reviewer or M&E advisor has reviewed the sampling plan before fieldwork starts

For sample size calculation with design effect and non-response adjustment, run the Sampling Calculator. For the broader sampling method decision, start with probability vs. non-probability sampling. For how sampling fits into baseline design planning, see baseline design.

Frequently Asked Questions

PreviousBaseline vs Endline vs Midline Surveys ExplainedNextHow Much Should You Budget for M&E?