Skip to main content
M&E Studio
Home
AI for M&E
AI GuidesPlaybooksPromptsPlugins
Resources
Indicator LibraryReference LibraryM&E Method GuidesTools
Services
About
ENFRES
M&E Studio

AI for M&E, Built for Practitioners

About

  • About Us
  • Contact
  • LinkedIn

Services

  • Our Services

AI for M&E

  • AI Guides
  • Playbooks
  • Prompts
  • Plugins
  • Workflows

Resources

  • Indicator Library
  • Reference Library
  • M&E Method Guides
  • Decision Guides
  • Tools

Legal

  • Terms
  • Privacy
  • Accessibility

© 2026 Logic Lab LLC. All rights reserved.

  1. M&E Library/
  2. Decision Guides/
  3. How to Design a Questionnaire for M&E
M&E How-to Guide

How to Design a Questionnaire for M&E

A good questionnaire is the difference between data you can defend and data you cannot. Here is how to design one: from the analysis plan backward, through the pretest, to the version your enumerators can actually run.

30-45 min
Target in-person survey length
7
Design steps
10
Common mistakes
Key Takeaway
Design from the analysis plan backward, not from the questionnaire forward
Most weak questionnaires fail the same way: someone drafted questions that sounded important, then tried to analyze whatever came back. Strong questionnaires are built backward from the analytical tables the program actually needs to produce. Define the output first, then work backward to questions and response options. Add pretesting, back-translation, and pilot as non-negotiable quality gates before fieldwork.

The Seven Design Steps

Build a questionnaire in this order. Each step depends on the ones before it. Jumping ahead is the root cause of most questionnaire failures.

#StepWhat happens
1Define the analysis planSpecify the tables and comparisons the questionnaire must produce; each indicator traced back to a planned analytical output
2Map questions to indicatorsOne or more questions per indicator, with response options designed for the planned analysis
3Draft questions in plain languageSimple, direct, unambiguous wording; one concept per question; appropriate response scales
4Structure and sequenceLogical grouping, sensitive topics late, skip logic planned
5Translate and back-translateIndependent translator, comparison to English, discrepancy resolution
6Cognitive interview (5-10 respondents)Respondent-level comprehension test, paraphrasing, surfacing ambiguity
7Field pilot (30-50 respondents)Full protocol test including enumeration, timing, logistics

Only after step 7 should a questionnaire go to full fieldwork. Skipping steps 5-7 is how preventable errors reach data collection and become permanent.

For the broader concept of survey design, see survey design.

Design From the Analysis Plan Backward

The single largest questionnaire design mistake is starting with the questionnaire and hoping analysis works out. Reverse this.

Step 1 is to produce a sketch of the analytical outputs the program will need at baseline, midline, endline, and donor reporting cycles. Each output table names:

  • The indicator being reported
  • Disaggregations (sex, age, location, vulnerability status, etc.)
  • The comparison (baseline vs endline, treatment vs control, urban vs rural)
  • The expected result shape (percentage, mean, count, scale score)

Once the analytical table is drafted, work backward: what question, asked in what form, with what response options, would produce exactly this cell in the table? Every question should trace to a specific analytical cell. Questions that do not trace back are candidates to cut.

This backward-design discipline does two things. It reduces questionnaire length by eliminating "it seemed relevant" questions that were never going to be analyzed. And it exposes indicator gaps at design time: if the analytical table has a row but no question produces that data, the gap is visible before fieldwork, not after.

Question Wording Rules

Once you know what questions to ask, write them clearly. Eight rules produce cleaner data.

One concept per question. "Have you attended a training on hygiene and are you practicing what you learned?" is two questions. Split into: "In the past 12 months, did you attend a hygiene training?" and "[if yes] In the past 7 days, did you practice handwashing at the recommended critical times?"

Plain language, no jargon. "How frequently do you engage in income-generating activities?" fails if respondents are not sure what counts. "In the past 7 days, did you do any work for payment or goods?" is concrete.

Specific time frames. "Do you regularly attend community meetings?" is vague. "In the past 3 months, how many community meetings have you attended?" is measurable.

Avoid double negatives. "Do you disagree that the program should not expand?" loses half the respondents. Rewrite positively: "Should the program expand?"

Avoid leading or loaded wording. "How satisfied are you with the valuable training you received?" presumes the answer. "How would you rate the training you received?" is neutral.

Concrete units. "How much land do you farm?" is ambiguous (hectares, acres, local unit?). "How much land do you farm, in hectares or acres (please specify)?" is collectible.

Response scales with a neutral midpoint when measuring attitudes. 5-point Likert scales (strongly disagree, disagree, neutral, agree, strongly agree) work better than 4-point forced-choice scales for attitude measurement in most cultures.

No implicit assumptions. "How many children do you have in school?" assumes the respondent has children and that the children are of school age. Lead with a screening question, or structure as "Do you have children of school age? [If yes] How many are currently enrolled?"

Response Options and Scales

Response options carry as much weight as question wording. Three guidelines.

Exhaustive and mutually exclusive categories. Every possible answer fits in one and only one category. Add "other (specify)" where uncertainty exists. Do not force respondents into the nearest approximation.

Match the scale to the analysis. Ordinal scales (ranked categories) support different analyses than interval or ratio scales. If you plan to compute means and run regressions, you need interval scales; if you plan to report percentages, ordinal works.

Numbers over labels for quantities. "How old are you?" with response options 18-29, 30-44, 45-59, 60+ loses precision. Record exact age and bin at analysis time if the analytical table needs bins. You can always bin precise data; you cannot unbin binned data.

Questionnaire Structure and Flow

A well-structured questionnaire opens with low-risk questions, groups related content, and places sensitive items after rapport is established.

Opening (10-20% of length): Introduction, consent, demographic screening, low-risk contextual questions. Warm-up section that establishes the interview rhythm.

Core content blocks (60-70% of length): Grouped by theme, with smooth transitions between blocks. Each block should open with a transition statement ("Now I'd like to ask about your household's water access...").

Sensitive content (10-15% of length, placed late): Income, health status, GBV, legal status, illegal activities. Place after rapport is built and respondent is comfortable. Informed consent should have flagged that any question can be skipped.

Closing (5-10% of length): Thank-you, any final open-ended questions, contact for follow-up if needed, instructions for continuing interaction.

Within each block, move from general to specific, closed-ended to open-ended, and simple to complex. The opposite sequence causes respondent fatigue and dropout.

Skip Logic and Routing

Skip logic (also called routing) moves respondents through relevant questions while skipping irrelevant ones. Poor skip logic is a major source of missing data and enumerator confusion.

Design rule: every skip should be visually obvious on the questionnaire, or programmatically enforced in digital instruments. Ambiguous skips ("If no, go to section C" without specifying where section C starts) produce inconsistent enumeration.

Testing skip logic is part of pretesting. Cognitive interviews surface whether respondents understand the transition; field pilot tests whether the full logic executes correctly in practice. Digital tools validate automatically; paper requires enumerator discipline.

Screen out respondents early when appropriate. If the questionnaire is about employed adults, ask about employment status first and terminate or redirect ineligible respondents. Dragging ineligible respondents through 30 minutes of irrelevant questions wastes field time and produces unusable data.

Cognitive Interviewing

Cognitive interviewing is the question-level pretest. 5-10 respondents from the target population work through the draft questionnaire aloud while an interviewer probes comprehension and answer-formation.

Standard cognitive-interviewing probes:

  • "In your own words, what is this question asking?"
  • "How did you arrive at your answer?"
  • "Are there any words or phrases in the question that were unclear?"
  • "Would different people interpret this question differently?"

What it surfaces: ambiguous wording, cultural mismatch, answer-formation difficulty, question sequencing problems. What it does not surface: workflow problems (enumerator handling, timing, logistics). That is what the field pilot is for.

Run cognitive interviews in the target language with target-population speakers. English-language cognitive interviewing of a Swahili questionnaire misses most translation-related comprehension errors.

Translation and Back-Translation

Translation is where English-language questionnaire design meets local reality, and where many errors originate.

Process:

  1. Draft in English
  2. Translate to target language(s) by a qualified translator
  3. Independent back-translation from target language to English
  4. Compare original English to back-translated English; flag discrepancies
  5. Resolve discrepancies (original author + translator + back-translator)
  6. Cognitive interview in target language with target-population speakers

Common translation errors: equivalent words with different connotations; culturally-specific concepts with no direct equivalent (household composition, kinship terms, privacy norms); idiomatic expressions that do not translate literally; technical terms that require coining or glossing in target language.

Back-translation alone does not catch all of these; cognitive interviewing in the target language is what surfaces concept-level mismatches. Budget time and cost for both.

Field Pilot

The field pilot is the full-protocol test: 30-50 respondents from the target population, using the real enumerators, real logistics, real timing, and real digital (or paper) tools. It is not cognitive interviewing; it is workflow testing.

What the pilot surfaces:

  • Total questionnaire time in practice (often longer than estimated)
  • Enumerator consistency in handling skip logic, probing, and recording
  • Instrument-level problems (digital form bugs, printing errors)
  • Respondent fatigue points (where responses start shortening)
  • Workflow bottlenecks (enumerator-supervisor review, data sync, quality checks)

Modifications after pilot: typical pilots produce 5-15 questionnaire revisions (wording, ordering, skip logic). Data from the pilot is usually not included in the main dataset unless the final instrument is identical.

Plan 1-2 weeks between pilot completion and full fieldwork start, enough time for revision, instrument re-compilation (if digital), and enumerator re-training on final wording.

Sector Examples

Health: Perinatal care knowledge questionnaire, East Africa

A maternal health program designed a 28-minute questionnaire measuring perinatal care knowledge and practice among pregnant women. Analysis plan required disaggregation by age group (15-19, 20-24, 25+) and by geography (urban/rural). Cognitive interviewing with 8 respondents surfaced one critical finding: the term "perinatal" had no equivalent in the local language and the initial translation used a word that meant "after birth" only. The team revised to separate questions on pregnancy, delivery, and postnatal periods, gaining measurement precision at 3 minutes of additional length. Field pilot with 42 respondents showed 18% respondent fatigue in the final block; the team moved two low-priority questions from that block to a follow-up visit.

Education: School enrollment and attendance survey, South Asia

An education program's baseline survey asked parents about child school enrollment and attendance. Initial draft used "attends school" as binary. Cognitive interviewing showed respondents interpreted this differently: some counted any enrollment, some counted present-in-last-30-days, some counted regularly attending. The team revised to three separate questions (ever enrolled, currently enrolled, attended in last 7 days) allowing the analysis to distinguish the three outcome concepts. Back-translation caught a mistranslation of "regular attendance" that had produced a word meaning "daily without exception" rather than "most days."

WASH: Household water practices survey, West Africa

A WASH program designed a household water and hygiene questionnaire. Pilot in 35 households revealed that the "improved water source" category definitions (JMP ladders) did not map cleanly onto local source types: "unprotected wells" and "boreholes with surface contamination risk" were being sorted inconsistently by enumerators. The team added a locally-calibrated observation checklist for enumerators to use alongside respondent self-report, improving reliability. Total pilot time was 38 minutes, above the 30-minute target; the team moved three low-priority modules (aspirational hygiene practices) to a separate follow-up study rather than cutting them.

Food security: Seasonal pastoralist survey, Sahel

A food security program designing a survey for pastoralist communities found cognitive interviewing indispensable. Initial questions about "months of food insecurity" (standard HFIAS structure) did not align with pastoralist seasonal patterns (wet/dry/transhumance rather than calendar months). The team revised to use seasonal markers (pre-migration, migration, post-migration, and wet season) that respondents could reliably identify, maintaining HFIAS scale compatibility through structured mapping. Back-translation caught the original mistranslation of "food gap" as "hunger death" which had produced alarming non-response rates in the pilot.

Common Mistakes

Mistake 1: Starting with the questionnaire, not the analysis plan. Questions that cannot trace to an analytical output usually produce unusable data. Build backward from planned tables, not forward from "important topics."

Mistake 2: Skipping cognitive interviewing. A questionnaire that has never been pretested with target respondents will have question-level comprehension errors you cannot see from the desk. Budget 1-2 days of cognitive interviewing before any pilot.

Mistake 3: Pilot-free fieldwork. Cognitive interviewing tests questions; piloting tests workflow. Both are needed. Skipping pilot means enumerator consistency, timing, and logistics problems arrive in full fieldwork.

Mistake 4: Double-barreled questions. One question, one concept. "Have you attended training AND are you practicing?" is two questions pretending to be one.

Mistake 5: Leading or loaded wording. Questions that presume the answer ("how much did you enjoy...") produce biased data. Neutral phrasing is harder but necessary.

Mistake 6: Binning quantities at collection. Age, income, time, and count variables should be collected as numbers, not categories. Binning can happen at analysis; unbinning cannot.

Mistake 7: Placing sensitive questions early. Income, GBV, legal status, health status should come after rapport is built. Early placement produces refusals and non-response.

Mistake 8: Translation without back-translation. Forward-only translation misses systematic errors. Back-translation + cognitive interviewing in the target language are the two controls.

Mistake 9: Ignoring skip logic testing. Skip logic in cognitive interviewing and in the field pilot both need explicit testing. Ambiguous skips produce missing data that cannot be recovered.

Mistake 10: No informed consent script aligned to the questionnaire. Consent should match what the questionnaire actually asks, including sensitive-question warnings and skip-option notices. See evaluation ethics checklist for consent-script requirements.

Pre-Fieldwork Questionnaire Checklist

Run through this before training enumerators.

Design:

  • Analysis plan drafted first, with indicator-to-question mapping
  • Every question traceable to a specific analytical output
  • Questionnaire length within the target window for method (30-45 min in-person)
  • No double-barreled, leading, or ambiguous questions
  • Response options exhaustive and mutually exclusive

Translation:

  • Translated to all target languages by qualified translators
  • Back-translated independently and discrepancies resolved
  • Target-language cognitive interview conducted

Pretesting:

  • Cognitive interviewing complete with 5-10 target-population respondents per language
  • Field pilot conducted with 30-50 respondents using full protocol
  • Revision incorporated and instrument finalized

Protocol:

  • Skip logic tested in both cognitive interviews and pilot
  • Sensitive questions placed late with consent-script alignment
  • Enumerator training covers interpretive edge cases, not just the instrument
  • Inter-rater agreement plan specified

Logistics:

  • Paper printing or digital instrument deployment tested
  • Supervisor review workflow defined
  • Data sync or collection protocol confirmed
  • Ethics approval reflects the final instrument version

For related topics, see paper vs digital data collection, evaluation ethics checklist, the 5 data quality dimensions, and surveys vs interviews vs focus groups. For an AI-assisted step-by-step workflow, see the Survey Design playbook.

Frequently Asked Questions

PreviousHow to Conduct a Data Quality AssessmentNextHow to Verify AI Outputs for M&E