What is the minimum sample size for an M&E survey?

There is no universal minimum. Sample size depends on what you want to detect (the minimum detectable effect), how confident you need to be (confidence level, typically 95%), how much error you can tolerate (margin of error, typically 5%), and whether you are sampling clusters (which increases sample size). For a simple proportion with 95% confidence and 5% margin of error, the standard calculation gives 385 respondents for a large population. But in practice, after adjusting for clustering and non-response, you typically need 500-1,500 households.

How does cluster sampling affect sample size?

Cluster sampling (sampling villages or schools first, then households within them) means people in the same cluster tend to be more similar to each other than to people elsewhere. This reduces the effective information per respondent, requiring a larger total sample to achieve the same precision. The 'design effect' quantifies this increase, typically ranging from 1.5 to 3.0 for common M&E indicators. A design effect of 2.0 means you need roughly double the sample you would need with simple random sampling.

What sample size do I need for qualitative research?

Qualitative sample sizes follow saturation logic, not statistical formulas. For key informant interviews, 12-20 for a homogeneous group (saturation typically around 12-16). For focus groups, 4-6 groups per segment with 6-10 participants each. For case studies, 4-10 depending on whether you want literal replication (3-4 similar cases) or theoretical replication (6-10 diverse cases).

How to Choose Sample Size for M&E

How Many People Do You Need to Survey?

"How many people do I need to survey?" is the most common technical question in M&E. It is also the question most often answered with a guess. Too few respondents and you cannot detect real change. Too many and you waste budget that could fund better training, longer fieldwork, or deeper qualitative work. Getting sample size right is not about precision for its own sake. It is about spending data collection money where it actually produces usable evidence.

The answer depends on five factors. Understanding them saves you from underpowered studies and bloated budgets alike.

Quick Reference: Rules of Thumb

385 for a basic proportion estimate (95% confidence, 5% margin of error, large population).

Double it for cluster sampling.

Add 15-20% for non-response.

Multiply by the number of subgroups you need to compare (e.g., 2 genders x 3 regions = 6 subgroups).

Use the Sampling Calculator to get an exact number for your situation.

The Five Factors

1. What change do you want to detect?

This is the minimum detectable effect (MDE): the smallest change in your indicator that you want your survey to be able to pick up. If your program expects to increase handwashing from 30% to 50% (a 20-percentage-point change), that is a large effect and requires fewer respondents. If you expect a shift from 60% to 65% (5 points), that is a small effect and requires many more.

Rule of thumb: Most development programs expect changes of 10-20 percentage points for behavioral indicators. If your expected change is smaller than 5 percentage points, you need a very large sample or should reconsider whether a survey is the right tool for measuring that indicator. Consider whether routine monitoring data or administrative records can track small changes more efficiently than a sample survey.

2. How confident do you need to be?

The confidence level is the probability that your results are not due to chance. The standard is 95% (meaning if you repeated the survey 100 times, 95 would give results within your margin of error). Some programs use 90%, which reduces sample size by about 25%. For internal learning purposes, 90% is often adequate. For external evaluations or academic publications, stick with 95%.

The margin of error (also called precision) is the range around your estimate. A 5% margin of error means if you measure 50% adoption, the true value is between 45% and 55%. Tighter margins require larger samples. Moving from 5% to 3% margin of error nearly triples your sample size.

3. How variable is your population?

If everyone in your target population is similar, you need fewer respondents. If there is wide variation, you need more. When you do not know the variance (common before a baseline design), using 50% as your proportion estimate gives the maximum sample size needed. This is the safest assumption and the one to use when you are planning blind.

If you have data from a previous survey or a similar program in the same area, use that proportion instead. A known baseline value of 20% or 80% requires a substantially smaller sample than the worst-case 50%.

4. Are you sampling clusters?

If you are sampling villages, schools, or health facilities first and then individuals within them, you need to adjust for the design effect. People in the same village tend to be more similar to each other than to people in other villages. Each additional person in the same cluster gives you less new information than a person from a new cluster would.

The design effect depends on the intra-cluster correlation (ICC) and the number of individuals per cluster:

Design Effect = 1 + (cluster size - 1) x ICC

Typical ICC values for common M&E indicators:

Indicator type	Typical ICC	Design effect (20 per cluster)
Immunization coverage	0.02-0.08	1.4-2.5
Handwashing/hygiene behavior	0.05-0.15	2.0-3.9
School attendance	0.05-0.15	2.0-3.9
Crop yield/income	0.05-0.15	2.0-3.9
Nutrition (stunting/wasting)	0.05-0.15	2.0-3.9

A design effect of 2.0 means you need roughly double the sample you would need with simple random sampling. This is the single biggest factor that inflates sample size in real-world M&E, and the one most often ignored in initial planning.

Most sampling methods used in field evaluations involve some form of clustering. If your enumerators are visiting communities, you are almost certainly doing cluster sampling, whether you call it that or not.

5. How many subgroups do you need to compare?

If you need to disaggregate results by sex, age group, geography, or disability status, you need enough respondents in each subgroup to make meaningful comparisons. This is where sample sizes get large quickly.

Rule of thumb: You need at least 30-50 respondents per subgroup for basic descriptive statistics, and 100+ per subgroup for meaningful comparisons between groups. If your donor requires gender-disaggregated results and comparison across 3 regions, that is 6 subgroups at 100+ each, meaning your total sample may need to be 600+ before adjusting for design effect and non-response.

Plan your disaggregation requirements before you calculate sample size, not after. Too many teams set a sample size first, then discover at analysis time that they cannot split the data the way the donor expects.

Qualitative Sample Sizes

Qualitative research does not use statistical formulas. Instead, it follows saturation logic: collect data until you stop hearing new information.

Method	Guideline	Saturation point
Key informant interviews (homogeneous group)	12-20	Usually 12-16
Key informant interviews (diverse population)	20-40	Usually 20-25
Focus group discussions	4-6 groups per segment	4 groups usually sufficient
FGD participants	6-10 per group	8 is the sweet spot
Case studies	4-10	Depends on purpose

These are guidelines, not rigid rules. The right number depends on your research question, the diversity of your population, and how much variation you encounter. But do not use "saturation" as a justification for interviewing 3 people and calling it qualitative research. If you have not heard a genuinely new perspective in 4-5 consecutive interviews, you have likely reached saturation. If every interview surprises you, keep going.

For mixed methods designs, your qualitative sample is typically a purposive subset of your quantitative sample. Select for diversity, not representativeness. The qualitative component explains the "why" behind the quantitative patterns, so choose participants who represent different experiences with the program. See How to Choose Evaluation Methodology for guidance on when mixed methods designs make sense.

When Budget and Statistics Disagree

This is the most common real-world problem. Your statistics say 1,200 households; your budget says 400.

Legitimate ways to reduce sample size:

Accept a larger margin of error. Moving from 5% to 7% reduces sample size by about 50%. For program management decisions (not academic research), 7% is often fine.
Accept 90% confidence instead of 95%. Reduces sample size by about 25%. Appropriate for internal learning, less so for external evaluations.
Reduce the number of subgroups. If you cannot afford gender AND age AND region disaggregation, choose the most important one. Be explicit about what you are giving up.
Reduce cluster size. Sampling 15 households per village instead of 25 reduces the design effect, though you need more villages (and more travel costs). There is an optimal balance.
Use stratified sampling. If you know some strata are more variable than others, you can allocate more sample to variable strata and less to homogeneous ones. This improves precision without increasing total sample size.

What you should NOT do:

Survey fewer people and hope for the best (underpowered studies waste money; they cannot detect real changes)
Drop the comparison group (without a comparison, you cannot attribute change to your program)
Ignore clustering in your analysis (this inflates your statistical significance and produces false positive results)
Do a "post-hoc power analysis" to justify an underpowered study (this is statistically invalid and reviewers will flag it)

The honest move is to present the donor with options. Show what precision you can achieve at each budget level.

How to Talk to Your Donor

When the sample size conversation comes up, frame it around trade-offs:

"With 400 households, we can measure the overall change in [indicator] with 7% precision, and disaggregate by sex. We cannot reliably compare across all 5 districts. If district-level comparison is essential, we need 800 households, which requires an additional $12,000 for data collection."

Most donors are reasonable when you present the trade-offs clearly, with cost implications attached. What they do not accept is discovering at endline that the sample was too small to detect change, with no warning beforehand.

Three principles for the conversation:

Lead with what they can learn, not what they cannot. "With this sample, you can confidently track overall program change and gender differences" is better than "this sample is too small for district comparisons."
Attach dollar amounts to each option. Donors think in budgets. "An additional 400 households costs $12,000 and gives you district-level comparisons" is a concrete trade-off they can evaluate.
Document the decision. Put the sample size rationale, the assumptions, and the agreed trade-offs in the baseline design report. When someone asks at endline why you only surveyed 400 households, you want a paper trail.

Use the Sampling Calculator to model different scenarios before the conversation. Walk in with 2-3 options at different budget levels so the donor can choose instead of argue.

Common Mistakes

1. Sample too small for disaggregation

You calculate 385 for the overall sample, then try to split by gender, age group, and region. Each cell ends up with 30-40 respondents, which is not enough for reliable comparisons. Fix: decide your disaggregation requirements first and size each subgroup independently.

2. Ignoring non-response

Not everyone you plan to interview will be available. Some refuse. Some are away. Some addresses are wrong. If you plan for exactly 385 and achieve a 15% non-response rate, you end up with 327 completed surveys and an underpowered study. Always add a 15-20% buffer.

3. Convenience sampling disguised as random

Surveying "whoever is at the health facility that day" or "the first 20 households near the road" is not random sampling, no matter what the methodology section says. Convenience samples cannot support statistical inference, regardless of size. A properly randomized sample of 300 is more credible than a convenience sample of 3,000.

4. No sampling frame

You cannot randomly sample without a list to sample from. If you do not have a beneficiary registry, a household listing, or a census, your first step is building one. Skipping this step means you are not doing random sampling, even if you think you are. Budget time and money for listing exercises before data collection.

5. Forgetting finite population correction

If your program serves 500 households and you calculate a sample of 385 (ignoring the correction), you are planning to survey 77% of the population. Apply the finite population correction and you only need 217. For small populations, this correction makes a real difference in cost and effort.

6. Ignoring clustering in analysis

This is the most dangerous mistake. You sample 30 villages, 15 households per village, and analyze the data as if you had 450 independent observations. You did not. You have 30 clusters. Ignoring this inflates your statistical significance and produces false positive results. If you used cluster sampling in data collection, you must account for it in analysis. Use survey commands (svy in Stata, survey package in R) or multilevel models.

How Many People Do You Need to Survey?

The answer depends on five factors. Understanding them saves you from underpowered studies and bloated budgets alike.

Quick Reference: Rules of Thumb

385 for a basic proportion estimate (95% confidence, 5% margin of error, large population).

Double it for cluster sampling.

Add 15-20% for non-response.

Multiply by the number of subgroups you need to compare (e.g., 2 genders x 3 regions = 6 subgroups).

Use the Sampling Calculator to get an exact number for your situation.

The Five Factors

1. What change do you want to detect?

2. How confident do you need to be?

3. How variable is your population?

4. Are you sampling clusters?

The design effect depends on the intra-cluster correlation (ICC) and the number of individuals per cluster:

Design Effect = 1 + (cluster size - 1) x ICC

Typical ICC values for common M&E indicators:

Indicator type	Typical ICC	Design effect (20 per cluster)
Immunization coverage	0.02-0.08	1.4-2.5
Handwashing/hygiene behavior	0.05-0.15	2.0-3.9
School attendance	0.05-0.15	2.0-3.9
Crop yield/income	0.05-0.15	2.0-3.9
Nutrition (stunting/wasting)	0.05-0.15	2.0-3.9

5. How many subgroups do you need to compare?

Qualitative Sample Sizes

Qualitative research does not use statistical formulas. Instead, it follows saturation logic: collect data until you stop hearing new information.

Method	Guideline	Saturation point
Key informant interviews (homogeneous group)	12-20	Usually 12-16
Key informant interviews (diverse population)	20-40	Usually 20-25
Focus group discussions	4-6 groups per segment	4 groups usually sufficient
FGD participants	6-10 per group	8 is the sweet spot
Case studies	4-10	Depends on purpose

When Budget and Statistics Disagree

This is the most common real-world problem. Your statistics say 1,200 households; your budget says 400.

Legitimate ways to reduce sample size:

Accept a larger margin of error. Moving from 5% to 7% reduces sample size by about 50%. For program management decisions (not academic research), 7% is often fine.
Accept 90% confidence instead of 95%. Reduces sample size by about 25%. Appropriate for internal learning, less so for external evaluations.
Reduce the number of subgroups. If you cannot afford gender AND age AND region disaggregation, choose the most important one. Be explicit about what you are giving up.
Reduce cluster size. Sampling 15 households per village instead of 25 reduces the design effect, though you need more villages (and more travel costs). There is an optimal balance.
Use stratified sampling. If you know some strata are more variable than others, you can allocate more sample to variable strata and less to homogeneous ones. This improves precision without increasing total sample size.

What you should NOT do:

Survey fewer people and hope for the best (underpowered studies waste money; they cannot detect real changes)
Drop the comparison group (without a comparison, you cannot attribute change to your program)
Ignore clustering in your analysis (this inflates your statistical significance and produces false positive results)
Do a "post-hoc power analysis" to justify an underpowered study (this is statistically invalid and reviewers will flag it)

The honest move is to present the donor with options. Show what precision you can achieve at each budget level.

How to Talk to Your Donor

When the sample size conversation comes up, frame it around trade-offs:

Three principles for the conversation:

Lead with what they can learn, not what they cannot. "With this sample, you can confidently track overall program change and gender differences" is better than "this sample is too small for district comparisons."
Attach dollar amounts to each option. Donors think in budgets. "An additional 400 households costs $12,000 and gives you district-level comparisons" is a concrete trade-off they can evaluate.
Document the decision. Put the sample size rationale, the assumptions, and the agreed trade-offs in the baseline design report. When someone asks at endline why you only surveyed 400 households, you want a paper trail.

Use the Sampling Calculator to model different scenarios before the conversation. Walk in with 2-3 options at different budget levels so the donor can choose instead of argue.

How to Choose Sample Size for M&E

How Many People Do You Need to Survey?

The Standard Formula

Finite Population Correction

The Five Factors

1. What change do you want to detect?

2. How confident do you need to be?

3. How variable is your population?

4. Are you sampling clusters?

5. How many subgroups do you need to compare?

Worked Examples

Example 1: Simple Baseline Survey

Example 2: Cluster-Sampled Baseline

Example 3: Gender-Disaggregated Endline

Qualitative Sample Sizes

When Budget and Statistics Disagree

How to Talk to Your Donor

Common Mistakes

1. Sample too small for disaggregation

2. Ignoring non-response

3. Convenience sampling disguised as random

4. No sampling frame

5. Forgetting finite population correction

6. Ignoring clustering in analysis

Frequently Asked Questions

Try it in M&E Studio

Related decision guides

Key concepts explained

Explore the topic

How to Choose Sample Size for M&E

How Many People Do You Need to Survey?

The Standard Formula

Finite Population Correction

The Five Factors

1. What change do you want to detect?

2. How confident do you need to be?

3. How variable is your population?

4. Are you sampling clusters?

5. How many subgroups do you need to compare?

Worked Examples

Example 1: Simple Baseline Survey

Example 2: Cluster-Sampled Baseline

Example 3: Gender-Disaggregated Endline

Qualitative Sample Sizes

When Budget and Statistics Disagree

How to Talk to Your Donor

Common Mistakes

1. Sample too small for disaggregation

2. Ignoring non-response

3. Convenience sampling disguised as random

4. No sampling frame

5. Forgetting finite population correction

6. Ignoring clustering in analysis

Frequently Asked Questions

Try it in M&E Studio

Related decision guides

Key concepts explained

Explore the topic