Skip to main content
M&E Studio
Home
Services
Tools
AI for M&E
Workflows
Plugins
Prompts
Start a Conversation
Library
Contribution AnalysisDevelopmental EvaluationImpact EvaluationLogframe / Logical FrameworkMost Significant ChangeOutcome HarvestingOutcome MappingParticipatory EvaluationProcess TracingQuasi-Experimental DesignRealist EvaluationResults FrameworkResults-Based ManagementTheory of ChangeUtilization-Focused Evaluation
M&E Studio

Decision-Grade M&E, Responsibly Built

About

  • About Us
  • Contact
  • LinkedIn

Services

  • Our Services
  • Tools

AI for M&E

  • Workflows
  • Plugins
  • Prompts
  • AI Course

M&E Library

  • Decision Guides
  • Indicators
  • Reference
  • Downloads

Legal

  • Terms
  • Privacy
  • Accessibility

© 2026 Logic Lab LLC. All rights reserved.

  1. M&E Library
  2. /
  3. Survey Design
Core ConceptData Collection6 min read

Survey Design

The process of designing structured questionnaires and survey protocols to collect reliable, valid, and actionable data from a defined population.

When to Use

Use survey design when you need to collect structured, comparable data across a large number of respondents to answer specific quantitative questions about a population. Surveys are the primary instrument for baselines, midlines, and endlines. They are also used for needs assessments, coverage monitoring, and performance surveys. Use them when you need data that is systematic, replicable, and statistically representative.

Surveys are not the right tool when you need to understand why something is happening (use focus group discussions or key informant interviews), when the population is too small for statistical analysis (use qualitative methods), or when the question requires narrative or interpretive answers.

How It Works

Step 1: Define the evaluation questions the survey must answer

Every survey item should trace directly to an evaluation or monitoring question. Items without a clear question-to-indicator link should be removed. Unfocused surveys produce data that no one uses.

Step 2: Draft the instrument

Write items using clear, simple language. Each item should measure one thing. Avoid double-barrelled questions ("Do you feel safe and supported?"), leading questions, and jargon. Use established, validated instruments wherever they exist (e.g., HDDS for dietary diversity, WDDS for women's dietary diversity, MDD-W for minimum dietary diversity).

Step 3: Design the question flow and skip logic

Organise items into logical sections. Use skip logic to route respondents past irrelevant sections. Begin with non-sensitive, rapport-building questions. Place sensitive items (income, violence) toward the end.

Step 4: Pilot the instrument

Test the draft with a small sample (15-30 respondents) from the same population type as the study. Identify misunderstood items, translation issues, and skip logic errors. Revise based on findings. Do not skip piloting, it is the single highest-return investment in data quality.

Step 5: Train enumerators

Enumerators must be trained on the instrument, interview protocols, consent procedures, and data entry. Run calibration exercises where pairs of enumerators interview the same respondent independently and compare results.

Step 6: Implement with quality controls

Use digital data collection (SurveyCTO, KoBoToolbox, ODK) to enforce skip logic, range checks, and required fields. Conduct field supervision with back-check surveys (re-interviewing a random 10% sample to verify enumerator data). Review daily data reports during data collection.

Key Components

  • Coverage: which topics and indicators are included, and which are deliberately excluded
  • Question types: Likert scales, multiple choice, open-ended, ranking, observation-based
  • Response categories: exhaustive, mutually exclusive, and appropriate for the population's understanding
  • Skip logic: routing that prevents irrelevant questions and reduces respondent burden
  • Translation and back-translation: if conducted in a language other than English, translate forward, then independently back-translate to verify meaning
  • Piloting protocol: plan for who, where, and how the instrument will be tested before deployment
  • Data entry and validation rules: built-in range checks and required fields for digital data collection

Best Practices

Use validated instruments. Reinventing widely used instruments (dietary diversity, food security, WASH) introduces comparability problems and quality risks. Use established tools with documented validity and reliability where they exist.

Collect outcome data, not just output data. Many surveys track what was delivered (outputs) rather than what changed (outcomes). Outcome indicators require outcome questions.

Collect baseline data before the programme starts. Without baseline data, change cannot be measured and impact cannot be assessed.

Match survey timing to measurement logic. Some outcomes need time to materialise. Collecting endline data 3 months after a 2-year programme intervention may be too early to detect genuine change.

Keep instruments short. Respondent fatigue produces lower quality data in the second half of long surveys. Aim for under 45 minutes for household surveys. Every item cut improves data quality on the items that remain.

Common Mistakes

Over-designing the instrument. Adding items "just in case" produces surveys that are too long, tire respondents, and generate data that is never analysed. Every item costs respondent time, enumerator time, and analysis effort.

Skipping the pilot. Pilots reveal translation problems, confusing items, and skip logic errors that are invisible on paper. Piloting with 20 respondents typically surfaces 80% of instrument problems.

Collecting data that cannot change the analysis. If you cannot afford to act on a negative finding, do not collect the data. Collecting data without intention to use it wastes respondent time and erodes community trust.

Failing to standardise across enumerators. If different enumerators interpret and administer items differently, the resulting data is not comparable. Calibration training and back-check protocols address this.

Examples

WASH baseline, East Africa. A UNICEF-funded WASH programme in Ethiopia used the WASH Conditions Assessment Tool as the basis for its baseline survey, adding 12 programme-specific items on hygiene behaviour. The 40-minute household survey was piloted in two villages outside the programme area before deployment. Calibration exercises between enumerator pairs identified a misunderstood definition of "improved latrine" that was corrected before field data collection. The final survey was administered to 1,800 households across three districts.

Food security survey, West Africa. A WFP-funded programme in Mali used the Household Food Insecurity Access Scale (HFIAS) and the Household Dietary Diversity Score (HDDS) as the core of its monitoring survey. These validated instruments enabled comparison with WFP's global database and with the programme's own baseline. Local language translation used forward-translation by bilingual programme staff followed by independent back-translation by a university linguist.

Compared To

MethodData TypeSample SizeDepth
SurveyStructured quantitativeLarge (100-5,000+)Shallow-medium
Focus Group DiscussionsQualitativeSmall (6-12 per group)Deep
Key Informant InterviewsQualitativeSmall (10-30)Very Deep
Observation MethodsDirect observationVariableMedium

Related Topics

  • Sampling Methods, how to select who to survey
  • Baseline Design, designing the first data collection point that surveys enable comparison against
  • Data Quality Assurance, the processes for verifying survey data quality
  • Validity, whether the survey measures what it is intended to measure
  • Reliability, whether the survey produces consistent results

Further Reading

  • USAID (2012). Performance Monitoring and Evaluation TIPS: Conducting Key Informant Interviews. USAID PNAC. Also covers surveys.
  • Grosh, M. & Glewwe, P. (eds.) (2000). Designing Household Survey Questionnaires for Developing Countries. World Bank. Comprehensive design reference.
  • KoBoToolbox (2024). Free digital data collection platform with survey design support. kobo.humanitarianresponse.info

At a Glance

Designs structured questionnaires that collect valid, reliable data from a representative population to answer specific evaluation or monitoring questions.

Best For

  • Baseline, midline, and endline data collection
  • Measuring outcomes across large populations
  • Generating comparable data across time points or sites

Complexity

Medium

Timeframe

2-6 weeks for instrument design, piloting, and finalisation

Linked Indicators

34 indicators across 4 donor frameworks

USAIDDFIDWHOUNICEF

Examples

  • Percentage of survey items with confirmed face validity post-piloting
  • Interviewer consistency rate across enumerator pairs
  • Response rate for primary survey instrument

Related Topics

Core Concept
Sampling Methods
Systematic approaches for selecting a subset of a population to represent the whole, balancing statistical validity with practical constraints.
Core Concept
Baseline Design
A structured approach to collecting initial condition data that directly informs project decisions, minimizes burden, and enables valid comparison with endline measurements.
Core Concept
Data Quality Assurance
A systematic process for verifying that collected data meets five quality dimensions, Validity, Integrity, Precision, Reliability, and Timeliness, ensuring data is fit for decision-making.
Core Concept
Key Informant Interviews
In-depth, semi-structured interviews with individuals selected for their specific knowledge, experience, or perspectives relevant to the evaluation questions.
Core Concept
Focus Group Discussions
A qualitative data collection method that brings together 6-10 participants to discuss a specific topic, generating rich insights through group interaction and shared experiences.
Term
Validity (Internal & External)
The degree to which an evaluation accurately demonstrates causal relationships (internal validity) and generalizes findings beyond the study context (external validity).
Term
Reliability
The consistency and repeatability of a measurement, whether the same tool produces stable results across repeated applications, different raters, or different time periods.
Term
Bias
Systematic error in data collection, analysis, or interpretation that distorts results and threatens the validity of M&E findings.