What Is Design Effect?
Design effect (DEFF) is a number that tells you how much statistical precision you lose when you use a complex sampling design instead of simple random sampling. It is the correction factor that keeps your sample size honest when your field reality is not a perfect random draw from the target population.
The practical meaning: if your DEFF is 1.8, each cluster-sampled interview carries 55% of the information a simple random interview would. To reach the same precision as a simple random sample of 300, you need 540 cluster interviews. Skip the correction and you have bought a survey that cannot answer the question you designed it to answer.
Design effect is not a penalty for cluster sampling. Cluster sampling is often the only practical choice: no usable household list, geographic dispersion too costly to traverse, population boundaries that match administrative clusters. The correction simply keeps the sample size calculation matched to the field design.
When Does It Apply?
Design effect applies whenever your sampling design groups respondents in ways that make within-group observations more similar than between-group observations. In M&E practice, that is most commonly cluster sampling, where you select villages or schools first and then households or students within them.
| Sampling design | DEFF applies? | Typical range |
|---|---|---|
| Simple random sampling | No | 1.0 |
| Stratified random sampling | No, may be slightly less than 1.0 | 0.9-1.0 |
| Cluster sampling (single stage) | Yes | 1.3-2.5 |
| Cluster sampling (multi-stage) | Yes, compounds across stages | 1.5-3.0 |
| Systematic sampling in homogeneous frames | Usually no | ~1.0 |
Stratified sampling, where you divide the population into subgroups before sampling, can actually produce a DEFF slightly below 1.0 because the subgroup structure reduces variance. Most M&E programs do not take advantage of this; they use cluster sampling to manage logistics and then fail to apply the corresponding DEFF adjustment. The result is a systematic tendency to undersample.
For definitions of the sampling approaches themselves, see cluster sampling, random sampling, and sampling methods.
The Formula and What It Means
The standard design effect formula is:
DEFF = 1 + (m - 1) x ICC
Three inputs:
- m is the average number of completed interviews per cluster (the cluster size)
- ICC is the intraclass correlation coefficient, a number between 0 and 1 that measures how similar observations within a cluster are relative to observations across clusters
- The 1 is the baseline DEFF for simple random sampling
An example. You plan to survey 20 households in each of 30 villages. Your ICC for household food security outcomes in similar contexts is 0.05. DEFF equals 1 + (20 - 1) x 0.05 = 1 + 0.95 = 1.95. Your required sample size under simple random sampling is 350. Multiplied by DEFF, your actual required sample is 682 interviews, distributed across the 30 villages at roughly 23 per village, not 20.
What the formula tells you: bigger clusters mean more within-cluster similarity and a larger DEFF. Smaller clusters mean less similarity and a smaller DEFF. If ICC were zero (observations within clusters no more similar than across clusters), DEFF would be exactly 1.0 no matter the cluster size.
Most M&E practitioners do not calculate ICC from scratch. They use empirical DEFF values from prior surveys in similar contexts. DHS surveys, MICS, and donor-commissioned evaluations in the same sector and geography often publish their DEFF values; these are a reasonable starting point.
Typical DEFF Values in M&E
| Survey type | Typical DEFF | Typical ICC |
|---|---|---|
| Household food security outcomes | 1.5-2.0 | 0.03-0.08 |
| WASH behaviors and access | 1.5-2.5 | 0.05-0.12 |
| Immunization and health service coverage | 2.0-3.0 | 0.10-0.15 |
| Education outcomes in cluster (school) surveys | 2.0-3.5 | 0.10-0.20 |
| Livelihoods and income | 1.3-1.8 | 0.03-0.06 |
| Gender-based violence (highly context-specific) | 2.0-4.0 | 0.10-0.25 |
The pattern: outcomes that are strongly shaped by the cluster itself (a school's teaching quality, a health facility's protocols, a village's water source) have higher ICCs and therefore higher DEFFs than outcomes that vary more by individual circumstance. When in doubt and without empirical values for your context, a DEFF of 2.0 is a defensible planning assumption for household cluster surveys. Under-adjusting is a larger risk than over-adjusting, because under-adjustment cannot be fixed after fieldwork.
Four Steps to Apply Design Effect
Design effect belongs in the sample size calculation before the field schedule is finalized. Four steps.
Step 1: Calculate the simple random sample size. Start with the standard sample size formula for your indicator type: a proportion (e.g., 40% of households practicing safe water storage), a mean (e.g., average income per month), or a change detection target (e.g., detect a 10 percentage point change in indicator X). Your margin of error, confidence level, and expected variability feed this calculation. This is your SRS baseline.
Step 2: Select a DEFF value. Choose from empirical prior-survey values in your sector and geography if available. If not, use 1.5-2.0 for household surveys, 2.0-2.5 for school or health facility surveys, 2.5-3.0 for highly heterogeneous or service-delivery-dependent outcomes. Document your choice and the reasoning; reviewers will ask.
Step 3: Multiply. Required cluster sample size = SRS sample size x DEFF. If SRS is 400 and DEFF is 1.8, cluster sample size is 720.
Step 4: Apply the non-response buffer on top of the DEFF-adjusted sample. If your expected non-response rate is 15%, divide the DEFF-adjusted n by 0.85. 720 / 0.85 = 847 selected households. This is your final field target. See common sampling mistakes for why skipping the non-response buffer compounds the design effect problem.
The Sampling Calculator performs all four steps automatically: input your population size, required precision, confidence level, DEFF, and non-response rate, and it returns your final field target along with a cluster allocation plan.
Where the Numbers Come From
The inputs to DEFF do not require you to run a complicated ICC calculation on every new survey. Most M&E programs reuse empirical values from prior surveys and standard references.
Prior surveys in the same context: The strongest evidence. If your program area has been surveyed previously (DHS, MICS, SMART, KAP, baseline studies), the DEFFs from those surveys are directly applicable. The methodology section of the report will usually cite them.
Published reference tables: DHS Sampling and Household Listing Manual, MICS survey design guidance, and WHO EPI cluster sampling manuals publish typical ICC and DEFF values by outcome domain. These are general-purpose defaults, less precise than a local prior-survey value but more robust than guessing.
Post-hoc calculation from pilot data: If you run a pilot of 3-5 clusters before scaling to the full survey, you can calculate an empirical ICC from the pilot and feed it back into the final sample size calculation. This is the most context-specific approach but adds 2-4 weeks to the timeline.
Default assumption when no data is available: DEFF = 2.0 for household cluster surveys, DEFF = 2.5 for school or facility cluster surveys, DEFF = 3.0 for highly heterogeneous service-delivery outcomes. These are conservative defaults that err toward oversampling rather than undersampling.
Document whichever source you use. "DEFF of 1.8 based on the 2022 DHS in [country] for comparable coverage indicators" is a defensible audit trail. "DEFF = 1.5 assumed" is not.
Sector Examples
Health: Vaccination coverage survey in East Africa
A district health team designed a vaccination coverage survey using a standard 30-by-7 cluster design (30 clusters, 7 interviews each). The team used simple random sampling formulas to calculate n = 210, then treated that number as the cluster sample size. Post-analysis ICC was 0.13, producing a DEFF of 1.78. The effective sample size was 118 interviews, not 210. Confidence intervals on the coverage estimate were plus or minus 9 percentage points. The program needed 5-point precision to detect whether a new outreach strategy was working. The survey could not confirm or refute the strategy's effect, and the program ran a second survey three months later with a properly sized cluster allocation at 1.6x the original cost.
WASH: Household water storage survey in West Africa
A WASH program designed a household survey to measure safe water storage practices across 40 villages, with a planned 15 interviews per village. SRS n was 380. The team applied a DEFF of 2.0 based on a published MICS survey in the same region, producing a target sample of 760. After the 15% non-response buffer, the field target was 894 households, or 22 per village. The final completed sample was 746. Confidence intervals on the safe storage estimate were plus or minus 3.8 points at 95% confidence, sufficient to detect the 7-point improvement the program had targeted. The program confirmed its outcome was met with defensible precision.
Education: Learning outcome survey in South Asia
A school-based learning assessment drew from 25 schools in a program's catchment, with 40 students per school. The education team used an ICC of 0.18 based on a prior cluster-randomized trial in a similar geography. DEFF was 1 + (39 x 0.18) = 8.02, a very high value reflecting the strong clustering of learning outcomes by school. The team's initial SRS calculation of n = 400 translated to a required cluster sample of 3,200 students. The team recognized this was infeasible at program budget and redesigned the study to 60 schools with 25 students each (n = 1,500), which the DEFF formula put at an effective sample of 320 SRS-equivalent interviews. The redesign gave the program a workable study at a cost 2.5x the original budget but one that could actually answer the learning question.
Food security: Consumption survey in the Sahel
A food security program ran a quarterly consumption survey across 20 pastoralist communities, with 12 households per community. The ICC for food consumption scores in pastoralist contexts in prior surveys was 0.04, producing a DEFF of 1 + (11 x 0.04) = 1.44. The team applied the DEFF to its SRS baseline of 280 to reach a required sample of 403, then added a 20% non-response buffer (communities dispersed during dry season migration) to reach a field target of 504 households. The survey was completed at 472 households, meeting the precision requirement of plus or minus 4 points on the food consumption score.
Common Mistakes
Mistake 1: Using an SRS sample size for cluster fieldwork. Calculating the sample as if observations are independent when they are grouped in clusters is the most common DEFF error in M&E practice. The field team selects 30 villages with 15 households each, then reports on 450 interviews as if each one carried full SRS information. Confidence intervals come out too narrow, results appear more precise than they are, and reviewer pushback eventually forces a re-analysis. Apply the DEFF multiplier before the sample size is committed.
Mistake 2: Applying DEFF at the analysis stage instead of the design stage. Some teams recognize the clustering issue only in analysis and apply survey-weighted estimation. This produces correct confidence intervals but cannot add the interviews that were never conducted. The confidence intervals are wider, the precision is lower, and the program cannot confirm whether its target was met. Analysis-stage correction is better than no correction, but it cannot rescue a fundamentally underpowered study.
Mistake 3: Choosing a DEFF value without evidence. A DEFF of 1.2 sounds defensible but, without a prior-survey reference or published source, it may well be wrong. Most M&E household cluster surveys produce empirical DEFFs of 1.5-2.0. Under-specifying the DEFF produces an undersized sample; over-specifying wastes resources but is safer. When in doubt, use 2.0 and document the reasoning.
Mistake 4: Forgetting that DEFF compounds in multi-stage designs. A two-stage cluster sample (villages, then households within villages) has a compounded design effect, not the single-stage DEFF. The compounding depends on the ICCs at each stage, but a useful rule of thumb is that multi-stage DEFFs are 20-40% higher than comparable single-stage DEFFs. Factor this in when the survey design uses nested clusters (districts, then villages, then households).
Mistake 5: Not documenting the DEFF source. The DEFF value should appear in the survey methodology with a citation: the prior survey it came from, the published reference, or the assumption basis. External reviewers, future replications, and endline comparisons all need this trail. A survey that does not document its DEFF source is harder to defend and harder to compare against.
Design Effect Checklist
Run through this before committing to your cluster survey field plan.
Design stage:
- DEFF value selected with documented source (prior survey, published reference, or conservative default)
- DEFF multiplier applied to SRS sample size in the planning calculation
- Multi-stage clustering factored in if your design uses nested clusters
- Cluster allocation plan produced (how many clusters, how many interviews per cluster)
Fieldwork stage:
- Cluster assignments documented before fieldwork starts
- Enumerator training covers the cluster design, not just the questionnaire
- Actual cluster size per village tracked in field logs (m may drift from planned m)
Analysis stage:
- Survey-weighted estimation specified in the analysis plan before data collection
- Primary sampling unit and strata variables included in the dataset
- Confidence intervals reported using design-corrected standard errors, not naive SRS standard errors
For sample size calculation with design effect and non-response buffer, run the Sampling Calculator. For related sampling design decisions, see cluster vs. stratified sampling and probability vs. non-probability sampling. For the full pre-fieldwork sampling workflow, see common sampling mistakes.