🔒 7: Privacy-Preserving AI

Privacy-preserving AI is about protecting sensitive data while still extracting valuable insights from it. This ensures models are trained, deployed, and used without compromising individual privacy. A core principle here is data minimization (GDPR) — only collect and process the minimum data required to achieve a specific function.

🔹 Key Techniques

Basic Anonymization

Remove unnecessary sensitive fields
Mask or replace identifiers with placeholders/hashes
Add random noise to numeric values (obfuscation)

Advanced Anonymization

K-anonymity: Ensures each record is indistinguishable from k others
Microaggregation: Replace values with group averages
Spatial aggregation: Generalize locations into regions or partial postal codes
Geocoding: Convert coordinates into generalized labels (e.g., what3words)

Rich Media Anonymization

Blurring, pixelation, masking (OpenCV)
Data perturbation: Transform pixels to obscure identity
Face replacement or GAN-generated synthetic images
Audio/Video: Voice alteration, background noise, speech-to-text and back

Differential Privacy (DP)

Ensures outputs remain “insensitive” to any single record’s presence Techniques include:
Input perturbation: Noise added to raw data
Objective perturbation: Noise in the optimization function
Output perturbation: Noise in model outputs

Federated Learning

Data stays at the source
Only model weights/updates are shared and aggregated
Reduces data exposure, supports compliance

Split Learning

Neural network split between client and server
Client processes data up to a “cut layer,” sends outputs only
Raw data never leaves device → stronger privacy

✅ In short: Privacy-preserving AI enables organizations to balance innovation with compliance while safeguarding individuals’ rights.