Why This Matters

M&E practitioners increasingly want to use AI tools - ChatGPT, Claude, Gemini - for qualitative analysis, report drafting, and data interpretation. But M&E data regularly contains personally identifiable information (PII): beneficiary names, locations, health status, contact details, and sensitive demographic combinations.

Pasting this data into cloud-based AI tools means sending it to external servers where you lose control over storage, access, and retention. This violates most data protection policies, donor requirements, and potentially GDPR and local data protection laws.

The solution isn't to avoid AI tools - it's to remove PII before sharing data with them.

What Counts as PII in M&E Data

Direct Identifiers (Must Always Remove)

These identify a specific individual on their own:

Names & Personal Details:

Full names, first names, surnames, nicknames
Mother's or father's names
Community or clan names

ID Numbers:

National ID, passport, social security numbers
Employee, beneficiary, or participant IDs
Health insurance or medical record numbers

Contact Information:

Phone numbers (mobile or landline)
Email addresses
Physical addresses, GPS coordinates
Postal codes (if precise enough to identify location)

Dates:

Birth dates or exact ages
Enrollment or registration dates
Dates of specific incidents (replace with year only or age ranges)

Digital Identifiers:

IP addresses, device IDs
Social media handles
Biometric data, photos showing faces

Indirect Identifiers (Review Carefully)

These can identify individuals in combination:

Small geographic areas (specific village, school, clinic)
Age + gender + location combinations
Unique job titles or organizational roles
Rare medical conditions or disabilities
Open-ended text with personal narratives
Unique characteristics ("the only refugee from X country")
Small group sizes (fewer than 3 individuals)
Sensitive combinations (HIV status + village + age group)

The combination test: If combining two or more data points could identify a specific person, treat the combination as PII even if individual fields seem safe.

The Removal Workflow

Step 1: Audit Your Data

Before processing, catalog what PII exists:

For structured data (Excel/CSV):

Review all column headers for PII fields
Check merged cells or hidden columns
Remove file metadata (author, last modified by)
Look for PII in unexpected columns (comments, notes fields)

For qualitative data (transcripts, interviews):

Search for proper nouns (names, places, organizations)
Flag specific locations or landmarks mentioned
Identify unique personal stories that could identify someone
Check for accidentally included contact information

For reports and documents:

Remove participant names from narratives
Delete acknowledgments with beneficiary names
Check headers/footers for file paths or author names
Review annexes for raw data tables

Step 2: Choose Your Approach

| Method | Best For | Effort | Reversible? | |---|---|---|---| | Manual redaction | Small datasets (fewer than 50 records), documents | High | No | | Find-and-replace | Structured data with known PII columns | Medium | No | | Automated detection | Large datasets, mixed content types | Low | Yes (with key) | | PIIGuard | Any M&E data before AI analysis | Low | Yes (encrypted key) |

Step 3: Replace, Don't Just Delete

Deleting PII can make data unusable. Instead, replace with consistent placeholders:

| Original | Bad (deleted) | Good (replaced) | |---|---|---| | "Maria Gonzalez" | [BLANK] | "PARTICIPANT_047" | | "Kibera, Nairobi" | [BLANK] | "URBAN_SETTLEMENT_A" | | "Age: 34" | [BLANK] | "Age: 30-39" | | "HIV positive" | [BLANK] | "HEALTH_CONDITION_X" |

Replacement rules:

Names → sequential codes (PARTICIPANT_001, PARTICIPANT_002)
Locations → generalized categories (RURAL_AREA_A, URBAN_CENTER_B)
Exact ages → age ranges (18-24, 25-34, 35-44)
Dates → year only or quarter
Sensitive conditions → generic codes

Step 4: Verify Before Sharing

Before pasting any data into an AI tool, run a final check:

[ ] No names remain anywhere in the data
[ ] No contact information (phone, email, address)
[ ] No ID numbers or unique identifiers
[ ] Locations generalized to appropriate level
[ ] Ages converted to ranges
[ ] Sensitive health/legal status coded
[ ] Open-ended text reviewed for embedded PII
[ ] Combination of remaining fields cannot re-identify anyone
[ ] File metadata cleaned

Using PIIGuard

M&E Studio's PIIGuard tool automates this workflow with four steps:

Upload your data (text, CSV, or document)
Review automatically detected PII - names, locations, dates, IDs, and sensitive combinations flagged by NLP analysis
Anonymize with consistent replacement tokens and save an encrypted restoration key
Restore original values after AI analysis is complete using your encryption key

All processing happens in your browser - your data never leaves your machine. The restoration key uses AES-256 encryption so you can reverse the anonymization when needed.

Key Principles

Anonymize before uploading - never paste raw M&E data into cloud AI tools
Replace, don't delete - maintain data utility while removing identifiers
Check combinations - individual fields may be safe, combinations may not be
Document your process - keep a record of what was anonymized and how
Use the right tool - manual review for small datasets, automated detection for large ones
When in doubt, remove it - err on the side of privacy over convenience

Why This Matters

The solution isn't to avoid AI tools - it's to remove PII before sharing data with them.

What Counts as PII in M&E Data

Direct Identifiers (Must Always Remove)

These identify a specific individual on their own:

Names & Personal Details:

Full names, first names, surnames, nicknames
Mother's or father's names
Community or clan names

ID Numbers:

National ID, passport, social security numbers
Employee, beneficiary, or participant IDs
Health insurance or medical record numbers

Contact Information:

Phone numbers (mobile or landline)
Email addresses
Physical addresses, GPS coordinates
Postal codes (if precise enough to identify location)

Dates:

Birth dates or exact ages
Enrollment or registration dates
Dates of specific incidents (replace with year only or age ranges)

Digital Identifiers:

IP addresses, device IDs
Social media handles
Biometric data, photos showing faces

Indirect Identifiers (Review Carefully)

These can identify individuals in combination:

Small geographic areas (specific village, school, clinic)
Age + gender + location combinations
Unique job titles or organizational roles
Rare medical conditions or disabilities
Open-ended text with personal narratives
Unique characteristics ("the only refugee from X country")
Small group sizes (fewer than 3 individuals)
Sensitive combinations (HIV status + village + age group)

The combination test: If combining two or more data points could identify a specific person, treat the combination as PII even if individual fields seem safe.

The Removal Workflow

Step 1: Audit Your Data

Before processing, catalog what PII exists:

For structured data (Excel/CSV):

Review all column headers for PII fields
Check merged cells or hidden columns
Remove file metadata (author, last modified by)
Look for PII in unexpected columns (comments, notes fields)

For qualitative data (transcripts, interviews):

Search for proper nouns (names, places, organizations)
Flag specific locations or landmarks mentioned
Identify unique personal stories that could identify someone
Check for accidentally included contact information

For reports and documents:

Remove participant names from narratives
Delete acknowledgments with beneficiary names
Check headers/footers for file paths or author names
Review annexes for raw data tables

Step 2: Choose Your Approach

Step 3: Replace, Don't Just Delete

Deleting PII can make data unusable. Instead, replace with consistent placeholders:

Replacement rules:

Names → sequential codes (PARTICIPANT_001, PARTICIPANT_002)
Locations → generalized categories (RURAL_AREA_A, URBAN_CENTER_B)
Exact ages → age ranges (18-24, 25-34, 35-44)
Dates → year only or quarter
Sensitive conditions → generic codes

Step 4: Verify Before Sharing

Before pasting any data into an AI tool, run a final check:

[ ] No names remain anywhere in the data
[ ] No contact information (phone, email, address)
[ ] No ID numbers or unique identifiers
[ ] Locations generalized to appropriate level
[ ] Ages converted to ranges
[ ] Sensitive health/legal status coded
[ ] Open-ended text reviewed for embedded PII
[ ] Combination of remaining fields cannot re-identify anyone
[ ] File metadata cleaned

Using PIIGuard

M&E Studio's PIIGuard tool automates this workflow with four steps:

Upload your data (text, CSV, or document)
Review automatically detected PII - names, locations, dates, IDs, and sensitive combinations flagged by NLP analysis
Anonymize with consistent replacement tokens and save an encrypted restoration key
Restore original values after AI analysis is complete using your encryption key

All processing happens in your browser - your data never leaves your machine. The restoration key uses AES-256 encryption so you can reverse the anonymization when needed.

Key Principles

Anonymize before uploading - never paste raw M&E data into cloud AI tools
Replace, don't delete - maintain data utility while removing identifiers
Check combinations - individual fields may be safe, combinations may not be
Document your process - keep a record of what was anonymized and how
Use the right tool - manual review for small datasets, automated detection for large ones
When in doubt, remove it - err on the side of privacy over convenience

PII Removal Before Sharing Data with AI

Why This Matters

What Counts as PII in M&E Data

Direct Identifiers (Must Always Remove)

Indirect Identifiers (Review Carefully)

The Removal Workflow

Step 1: Audit Your Data

Step 2: Choose Your Approach

Step 3: Replace, Don't Just Delete

Step 4: Verify Before Sharing

Using PIIGuard

Key Principles

PII Removal Before Sharing Data with AI

Why This Matters

What Counts as PII in M&E Data

Direct Identifiers (Must Always Remove)

Indirect Identifiers (Review Carefully)

The Removal Workflow

Step 1: Audit Your Data

Step 2: Choose Your Approach

Step 3: Replace, Don't Just Delete

Step 4: Verify Before Sharing

Using PIIGuard

Key Principles