AI for M&E

Generating Survey Questions with AI

Use AI tools to accelerate survey development from weeks to days while maintaining quality through structured prompting and field expert validation.

Ben Playfair5 min read
aisurveysdata collectionkobotoolboxchatgpt

From Weeks to Hours

You have a baseline survey due in two weeks. Manually writing, testing, and refining 40 questions would normally take 3-4 weeks of iteration. Using AI tools with structured prompting, you can generate an initial question bank, test for bias, adapt for local context, and have a survey-ready questionnaire in 2-3 hours of focused work.

The key is combining AI's speed with field expert review to ensure questions are culturally appropriate, clearly worded, and measure what you actually need to know.

The Three-Step Process

1. Generate: Create Your Question Bank

Start by translating your evaluation framework into survey questions. For each research question, identify what behaviors, knowledge, or attitudes you need to measure.

Prompt template:

I'm designing a [BASELINE/ENDLINE/MONITORING] survey for a [SECTOR]
program in [COUNTRY]. The target respondents are [DESCRIPTION  -
age, gender, literacy level, language].

Research question: [YOUR RESEARCH QUESTION]

Generate 8-10 survey questions that measure [SPECIFIC CONSTRUCT].
For each question, provide:
1. Question text (appropriate for [LITERACY LEVEL])
2. Question type (multiple choice, Likert, open-ended, ranking)
3. Response options (if applicable)
4. Rationale for why this question captures the construct

Requirements:
- Survey will be administered via [PHONE/TABLET/PAPER]
- Maximum question length: 25 words
- Response time per question: under 30 seconds
- Avoid double-barreled questions
- Avoid leading or loaded language

This generates a draft question bank. You'll select, modify, and refine from these options - not use them verbatim.

2. Detect: Screen for Bias

Every draft question should be checked for common survey design problems. AI is surprisingly good at identifying issues human reviewers miss.

Bias detection prompt:

Review these survey questions for a [PROGRAM] targeting
[POPULATION] in [CONTEXT]:

[PASTE YOUR DRAFT QUESTIONS]

Check each question for:
1. Leading language (words that push toward a particular answer)
2. Social desirability bias (questions people will answer to look good)
3. Double-barreled questions (asking two things at once)
4. Ambiguous wording (could be interpreted multiple ways)
5. Cultural sensitivity issues for [CONTEXT]
6. Literacy appropriateness for [EDUCATION LEVEL]
7. Translation challenges (phrases that don't work in [LANGUAGE])

For each issue found, explain the problem and suggest a revised version.

3. Adapt: Contextualize for Your Setting

Survey questions that work in one context often fail in another. Use AI to flag potential adaptation needs, then validate with local staff.

Cultural adaptation prompt:

These survey questions will be used with [TARGET POPULATION]
in [SPECIFIC LOCATION]. The survey will be administered in
[LANGUAGE] by [DATA COLLECTORS DESCRIPTION].

For each question, identify:
1. Concepts that may not translate directly
2. Response options that may not fit local categories
3. Sensitive topics that need careful framing
4. Time references that need localization
5. Examples or scenarios that should use local context

Suggest adapted versions where needed, but flag any questions
that require validation by someone with local expertise.

Question Type Selection

Choose question types based on what you need to learn and how you'll analyze the data:

| Question Type | Best For | Analysis Method | Limitation | |---|---|---|---| | Multiple choice | Knowledge, demographics, categorical data | Frequencies, cross-tabs | May miss important options | | Likert scale | Attitudes, satisfaction, agreement | Mean scores, comparisons | Acquiescence bias | | Open-ended | Explanations, unexpected findings, context | Thematic coding | Time-intensive to analyze | | Ranking | Priorities, preferences | Priority scoring | Complex for respondents | | Matrix | Multiple items on same scale | Efficient collection | Survey fatigue if too long |

Practical Guidelines

Question Length and Complexity

  • Phone surveys: Maximum 20 words per question, 15-20 questions total
  • Tablet/in-person: Up to 25 words, 25-35 questions
  • Self-administered: Up to 30 words, 30-40 questions with clear instructions

Response Options

  • Multiple choice: 4-6 options plus "Other" and "Don't know/Prefer not to answer"
  • Likert scales: Use 5 points for most purposes; 4 points to force a direction
  • Always include an opt-out option for sensitive questions

Survey Flow

  1. Start with easy, non-threatening questions (demographics)
  2. Move to the core measurement questions in the middle
  3. Place sensitive questions toward the end (once rapport is built)
  4. End with open-ended questions for additional context

What AI Can't Do

  • Validate cultural appropriateness - only local experts can do this
  • Test question comprehension - requires cognitive interviews with real respondents
  • Ensure ethical compliance - IRB/ethics review is separate from survey design
  • Replace pilot testing - always test with 5-10 respondents before full deployment

Quality Checklist Before Deployment

  • [ ] Every question maps to a specific indicator or research question
  • [ ] No double-barreled questions
  • [ ] No leading or loaded language
  • [ ] Response options are mutually exclusive and exhaustive
  • [ ] Skip logic is documented and tested
  • [ ] Survey takes under target time limit in pilot
  • [ ] Translations reviewed by native speakers
  • [ ] Data collectors trained on question intent, not just reading
  • [ ] Consent script included and approved