Code Qualitative Data with AI

A 5-step prompt workflow that takes raw interview or focus group transcripts through to a coded dataset with thematic analysis. Run all prompts in a single AI conversation.

45-60 min5 stepsIntermediateAnalysis

What you'll build

A codebook, coded dataset organized by theme, and a thematic analysis summary with illustrative quotes.

Before you start

  • Anonymized transcripts or notes (replace real names with codes like P01, P02)
  • Your evaluation questions or research questions
  • Any existing framework or theory the analysis should align with (optional)
1Generate a Codebook

Start by generating an initial codebook from your evaluation questions and a sample of your data. The codebook defines what you are looking for before you start coding.

Step 1: Generate a Codebook

You are a senior M&E qualitative analyst. I need to develop a codebook for analyzing qualitative data from my program evaluation. Based on the following evaluation questions and data sample, generate a codebook with 8-15 codes organized into 3-5 themes. For each code, provide: - Code name (short label) - Theme it belongs to - Definition (one sentence, precise enough that two coders would agree) - Inclusion criteria (what counts) - Exclusion criteria (what does not count) - Example quote (illustrative) Include at least one theme for unexpected or emergent findings that do not fit the predetermined framework. My evaluation questions are: [Describe your evaluation questions or paste them here] Here is a sample of my data (2-3 transcript excerpts): [Paste 2-3 anonymized excerpts here]

Start with 8-15 codes. You can always split or merge codes later. Starting with too many codes (30+) makes consistent coding nearly impossible.

2Extract Meaningful Segments

Now use the codebook to identify and extract the meaningful segments from your transcripts. Each segment is a passage that relates to one or more codes.

Step 2: Extract Meaningful Segments

Using the codebook you just created, review the following transcript and extract all meaningful segments. For each segment: - Quote the exact text (keep it verbatim) - Assign the relevant code(s) from the codebook - Note the respondent ID - Flag any segments that do not fit existing codes (mark as "emergent") Extract segments that are substantive (at least one full sentence). Skip filler, greetings, and off-topic conversation. Transcript: [Paste your anonymized transcript here]

Process one transcript at a time. If a transcript is longer than 3,000 words, split it into segments. After the first 3-5 transcripts, review whether new codes are emerging. If they are, update the codebook before continuing.

3Assign and Validate Codes

Review the extracted segments and check the code assignments for consistency. This is where you catch coding drift and refine the codebook if needed.

Step 3: Assign and Validate Codes

Review all the coded segments from the previous step. For each code in the codebook: 1. List all segments assigned to that code 2. Check whether each segment genuinely fits the code definition (flag any that are borderline or miscoded) 3. Identify codes that are being used inconsistently (the definition says one thing but the assigned segments suggest another) 4. Recommend any codebook changes: - Codes to split (too broad, capturing different concepts) - Codes to merge (too similar, coders would struggle to distinguish) - New codes needed for emergent segments - Codes to drop (no segments assigned) Present this as a coding consistency report with recommended codebook revisions.

If you are the only person coding, consider independently re-coding 10% of segments after a break. Comparing your first-pass and second-pass codes is a simple inter-coder reliability check.

4Identify Patterns and Themes

Move from coded segments to patterns. Look for relationships between codes, differences across respondent groups, and the overall story the data tells.

Step 4: Identify Patterns and Themes

Analyze the coded data and identify patterns across the themes. Produce: 1. **Theme summary** (one paragraph per theme): What does this theme tell us? How strong is the evidence (how many respondents, how consistent)? 2. **Cross-theme patterns**: Where do themes connect or contradict each other? Are there respondents whose experience cuts across multiple themes in interesting ways? 3. **Disaggregation findings**: Are there notable differences by respondent group? State what differs and for whom. If there are no meaningful differences, say so. 4. **Unexpected findings**: What emerged that was not anticipated by the evaluation questions? Why does it matter? 5. **Illustrative quotes**: For each theme, select the single most powerful quote that captures the essence of that theme. Include respondent ID. Present findings in a way that directly addresses the original evaluation questions.

The best qualitative findings name the pattern and then show it with a quote. Lead with the pattern, not the quote.

5Write the Analysis Summary

Synthesize everything into a structured analysis section ready for inclusion in an evaluation report.

Step 5: Write the Analysis Summary

Write a qualitative analysis section for an evaluation report based on the thematic analysis we just completed. Structure it as follows: 1. **Methodology note** (100-150 words): Briefly describe the data sources, number of respondents, coding approach, and any limitations of the qualitative component. 2. **Findings by evaluation question**: For each evaluation question, present the relevant findings with: - A clear statement of what the data shows - Supporting evidence (2-3 illustrative quotes with respondent IDs) - Disaggregated findings where differences exist - Strength of evidence assessment (strong, moderate, or limited, with reasoning) 3. **Cross-cutting findings** (200-300 words): Patterns that span multiple evaluation questions. 4. **Limitations of the qualitative analysis** (100-150 words): Be specific about what the data cannot tell us. Write for a donor audience. Be direct, evidence-based, and honest about what the data does and does not support.

Every finding needs evidence strength. "All 30 respondents said X" is strong. "Two respondents mentioned Y" is limited. Name the strength so the reader can weigh it.

Not sure which AI tool to use?

Try the AI Tool Selector to find the best tool for your specific M&E task, or browse 130+ M&E-specific prompts.

Related Resources