This project proposes the development of a programmable synthetic data generation framework to enable privacy-preserving data sharing.
By leveraging hybrid NLP models and Generative AI, the algorithm aims to generate structured synthetic datasets that statistically resemble real-world PII data while ensuring data privacy. The project primarily addresses the issue of fairness in AI model training.
Project Deliverables/Outcomes/Impact:
- A hybrid NLP module for accurate identification of PII in structured datasets.
- A GAN-based synthetic data generator integrated with DP for enhanced privacy.
- A benchmarking tool to assess utility, fairness, and privacy risk of synthetic datasets.