What is the most common sampling mistake in M&E evaluations?

Using cluster sampling in the field but analyzing data as if it were simple random sampling. This produces confidence intervals that are too narrow, making results appear more precise than the data supports. The fix is straightforward: specify survey-weighted estimation in your analysis plan before fieldwork begins, accounting for the cluster structure and design effect.

What is a sampling frame and why does it matter for M&E?

A sampling frame is the list or boundary from which you draw your sample. If the frame is wrong or incomplete, your sample inherits those errors regardless of how carefully you select from it. A common problem: using a beneficiary registration list that is 18 months old, which excludes late joiners and includes people who left. Verify frame currency and coverage before drawing the sample, not after.

How do you apply design effect when calculating sample size?

Multiply your required sample size, calculated for simple random sampling, by the design effect factor. For household cluster surveys in development contexts, design effect is typically 1.5-2.0. If SRS requires n=300 and your design effect is 1.8, your required cluster sample is 540. The Sampling Calculator applies this adjustment automatically when you select cluster sampling as your method.

Common Sampling Mistakes in M&E

#	Mistake	Phase	Consequence
1	Wrong or incomplete sampling frame	Design	Systematic exclusion of subpopulations
2	No design effect adjustment	Design	Sample too small, confidence intervals too wide
3	No non-response buffer	Design	Final sample below precision requirements
4	Enumerator substitution without protocol	Field	Convenience bias replaces random selection
5	Convenience sample labeled as random	Field	Indefensible findings, credibility loss
6	No documentation of selection logic	Field	Methodology cannot be verified or replicated
7	Cluster data analyzed as simple random	Analysis	Confidence intervals too narrow, overconfidence
8	Overclaiming generalizability	Analysis	Findings exceed what the sample supports

Wrong or Incomplete Sampling Frame

The sampling frame is the list or boundary from which you draw your sample. If the frame is incomplete or inaccurate, your sample inherits those errors. No amount of statistical precision corrects a bad frame.

The three most common frame problems in M&E field surveys:

Outdated registration lists: A beneficiary roster from 18 months ago excludes people who joined the program since then and includes people who have left, moved away, or died. The older the list, the higher the exclusion rate. In programs with high population mobility (displaced communities, urban informal settlements, seasonal labor programs), a list more than 6 months old can be badly out of date.

Incomplete geographic coverage: A household registry covering the central program districts but not the remote ones produces a sample that systematically underrepresents the people least likely to report positive outcomes. This is the coverage bias problem, and it tends to inflate performance results.

Wrong unit of selection: If your target population is all households in a catchment area but your frame is a health facility attendance list, you are missing everyone who does not use the facility. The two populations can be very different.

Before you draw the sample, verify three things about your frame: when it was last updated, whether it covers the full target population or only a subset, and whether the unit of selection matches the unit of analysis. If a complete frame does not exist, cluster sampling is the practical alternative. See cluster vs. stratified sampling for when to switch methods.

Ignoring Design Effect and Non-Response

Two calculation errors routinely produce samples that are too small to detect the changes your program is designed to produce.

Design effect: If you use cluster sampling in the field and calculate your sample size as if you were using simple random sampling, you will end up underpowered. People within clusters are more similar to each other than randomly selected individuals would be, which means each within-cluster interview adds less information than a truly independent interview would. The design effect factor (typically 1.5-2.0 for household surveys in development contexts) adjusts for this. If SRS requires 300 completed interviews and your design effect is 1.8, you need 540 interviews, not 300. See cluster sampling for design effect formulas and typical values by context.

Non-response buffer: Survey non-response is predictable. Households are absent, head of household refuses, or the interview is incomplete. If you calculate a required n of 400 and your non-response rate is 15%, you complete 340 interviews. Whether 340 meets your precision requirements depends on the calculation, but you will not know until it is too late to do anything about it. Add a 10-20% buffer to your calculated sample size to account for attrition before you finalize your field plan.

Both errors are entirely preventable at the design stage. They are expensive to correct during fieldwork and often impossible to fix after data collection is complete.

Field Execution Failures

Three field-level mistakes account for most sampling integrity problems in M&E programs.

Enumerator substitution without a documented protocol: When a selected household is unavailable, enumerators need explicit written instructions on what to do next. Without a protocol, they default to the nearest convenient substitute, which is not random substitution; it is convenience sampling attached to a random label. Establish a substitution rule before fieldwork: for example, attempt a callback visit before substituting, and if substituting, select the next unit on the systematic list and document the reason. Whatever the rule is, it must be documented and trained before teams go to the field.

Convenience sampling labeled as random: Enumerators under time pressure or with difficult access routes survey households near the road, households where people are visibly present, or households where the respondent is immediately willing. The resulting sample is biased in ways that are impossible to quantify after the fact. GPS tracking of interview locations and spot-check supervision during fieldwork are the primary controls.

No documentation of selection process: The selection logic must be written down: the sampling frame used, the skip interval or random numbers applied, who conducted the selection, how refusals and absences were handled, and how substitutions were made. Without this record, no external reviewer can verify that the sample was selected as described. This documentation takes less than an hour to produce and protects months of data collection investment. It is also the difference between a methodology that can be replicated at endline and one that cannot.

Analysis and Reporting Errors

Two analysis-stage mistakes appear repeatedly in development sector evaluations.

Analyzing cluster sample data as simple random sample data: This is the most common statistical error in program evaluations. When you use cluster sampling in the field but run standard statistical tests assuming simple random sampling, you underestimate the standard errors on your estimates. Your confidence intervals are too narrow, meaning you are expressing more confidence in your results than the data structure supports. The correction requires specifying the cluster design in your statistical analysis: survey-weighted estimation commands are available in R, Stata, Python (survey package), and SPSS. This takes one line of code if the analysis plan was written correctly before fieldwork. See how to choose sample size for design specifications that feed correctly into analysis.

Overclaiming generalizability: Your sample represents the population it was drawn from, bounded by the geographic scope, time period, and sampling frame used. A survey of beneficiaries in 3 of 12 program districts does not produce findings that generalize to all 12 districts unless you can demonstrate the 3 are representative of the 12, which requires evidence beyond the survey itself. Scope your conclusions explicitly: "among beneficiaries in the three study districts during the monitoring period" rather than "among all program beneficiaries." Reviewers and external evaluators will push back on overclaiming, and the correction after report publication is painful.

Sector Examples

Health: Frame currency failure in East Africa

A district vaccination program ran a coverage survey using a household registry last updated 22 months earlier. Field teams found that 18% of selected households had relocated or dissolved since the registry was compiled. Substitutes were selected by proximity, not by a documented protocol. The coverage estimate of 71% could not be defended in the external evaluation: the combination of frame error and undocumented substitution introduced unmeasurable bias. The external evaluator recommended the survey be repeated with a current frame and a substitution protocol. That cost three times what the original survey cost.

WASH: Design effect miscalculation in West Africa

A WASH program designed a household survey with a calculated n of 280 interviews based on simple random sampling assumptions. Data collection used cluster sampling across 14 villages. The design effect in similar contexts is 1.8. The effective sample size was 156 interviews, not 280. The margin of error was plus or minus 14 percentage points at 95% confidence, versus the 8-point precision the program needed to detect its target improvement. The survey was underpowered and could not confirm whether the program's water point coverage target was met. A pre-fieldwork design effect calculation would have caught this before a single enumerator was hired.

Education: Purposive sample overclaim in South Asia

A school improvement program studied 8 purposively selected high-performing schools and 8 low-performing schools, then reported that "nationally, students in well-resourced schools have 40% higher textbook access." The purposive sample supports a within-study comparison between the 16 selected schools; it cannot support a national generalization. The finding was sound; the reported scope was not. A single sentence change ("among the 16 schools studied") would have preserved the finding and avoided the reviewer pushback.

Livelihoods: Analysis error in Southern Africa

A livelihoods program ran a beneficiary satisfaction survey using cluster sampling across 12 community groups. The analysis team used a standard chi-square test without specifying the clustered design. The resulting p-values were systematically too small, leading the team to report statistically significant differences between subgroups that were not significant once the cluster structure was accounted for. The re-analysis, conducted 3 weeks after the report was shared with the donor, reversed two of the five headline findings.

Common Mistakes

Mistake 1: Using an outdated or incomplete sampling frame. The frame is the list you sample from. If it is 18 months old or covers only a subset of your target population (facility attenders, registered members), your sample inherits those exclusions. The people left out are often the hardest to reach and the least likely to report positive outcomes. Update and verify the frame before drawing your sample.

Mistake 2: Ignoring design effect in cluster sampling. If you calculated your sample size for simple random sampling but are executing cluster sampling in the field, your confidence intervals are too wide and your sample is too small. Apply the design effect multiplier (typically 1.5-2.0) to your required n before fieldwork. Skipping this step is the most common reason household surveys cannot confirm whether program targets were met.

Mistake 3: No substitution protocol for absent households. When a selected household is unavailable, enumerators need written rules for what to do. Without a protocol, they substitute by convenience. This is not random; it is selection bias with a random label. Write the substitution rule before fieldwork begins, train enumerators on it explicitly, and document every substitution made during data collection.

Mistake 4: Analyzing cluster data as simple random sample data. Using standard statistical tests on cluster sample data produces confidence intervals that are too narrow. You are asserting more precision than the data structure supports. Specify the survey design in your analysis package before running any tests. This takes one line of code and prevents findings from being challenged on methodological grounds.

Mistake 5: Generalizing beyond the sample's scope. Your sample represents the population it was drawn from: the districts surveyed, the time period covered, the beneficiaries included in the frame. Reporting findings as applying to all program beneficiaries when you surveyed 3 of 12 districts overclaims what the data can support. Scope your conclusions explicitly in the report.

Pre-Fieldwork Sampling Checklist

Run through this before committing to your data collection budget and field schedule. Each item that you cannot check off represents a risk to your data quality.

Sampling frame:

Frame covers the full target population, not a subset (facility attenders, registered members, etc.)
Frame was updated within the past 12 months (or verified as current)
Unit of selection in the frame matches the unit of analysis in your indicator

Sample size:

Design effect applied if using cluster sampling (typical value 1.5-2.0; higher in heterogeneous populations)
Non-response buffer included (10-20% depending on context and population accessibility)
Subgroup sample sizes checked if subgroup comparisons are a reporting requirement

Field protocol:

Substitution protocol documented in writing (when to substitute, how to select the replacement)
Enumerators trained on sampling rules, not just questionnaire administration
GPS or location verification planned to audit coverage and detect convenience bias

Analysis readiness:

Analysis plan specifies survey-weighted estimation if cluster sampling is used
Margin of error and confidence level agreed with program management and stakeholders before fieldwork
Scope of generalizability defined in writing (which population, which geography, which time period)
External reviewer or M&E advisor has reviewed the sampling plan before fieldwork starts

For sample size calculation with design effect and non-response adjustment, run the Sampling Calculator. For the broader sampling method decision, start with probability vs. non-probability sampling. For how sampling fits into baseline design planning, see baseline design.

Common Sampling Mistakes in M&E

The Eight Sampling Mistakes

Wrong or Incomplete Sampling Frame

Ignoring Design Effect and Non-Response

Field Execution Failures

Analysis and Reporting Errors

Sector Examples

Health: Frame currency failure in East Africa

WASH: Design effect miscalculation in West Africa

Education: Purposive sample overclaim in South Asia

Livelihoods: Analysis error in Southern Africa

Common Mistakes

Pre-Fieldwork Sampling Checklist

Frequently Asked Questions

Common Sampling Mistakes in M&E

The Eight Sampling Mistakes

Wrong or Incomplete Sampling Frame

Ignoring Design Effect and Non-Response

Field Execution Failures

Analysis and Reporting Errors

Sector Examples

Health: Frame currency failure in East Africa

WASH: Design effect miscalculation in West Africa

Education: Purposive sample overclaim in South Asia

Livelihoods: Analysis error in Southern Africa

Common Mistakes

Pre-Fieldwork Sampling Checklist

Frequently Asked Questions