GAN-Based Privacy-Preserving Data Synthesis in Healthcare

Carson James

Authors

Carson James¹
¹Obafemi Awolowo University, Ile-Ife, Nigeria.

Abstract

Strict privacy laws make it difficult to access high-quality clinical datasets, posing a major challenge for the advancement of artificial intelligence in healthcare. Generative Adversarial Networks (GANs) offer a promising approach for synthesizing realistic healthcare data without exposing sensitive patient information. This paper investigates the use of GAN-based frameworks to generate synthetic medical datasets that preserve both statistical properties and clinical relevance while ensuring patient confidentiality. We review state-of-the-art GAN architectures applied in healthcare, evaluate their performance in terms of privacy protection and data utility, and propose a privacy-aware GAN synthesis framework. Experimental results demonstrate that the proposed approach significantly reduces privacy risks while maintaining data quality suitable for disease prediction and treatment outcome analysis. The findings support safer data sharing and collaborative AI research in the medical domain.

Keywords

Generative Adversarial Networks (GANs) Healthcare Data Synthesis Privacy Preservation Synthetic Data Medical AI Data Anonymization Differential Privacy EHR Data Generation Deep Learning in Healthcare Data Sharing in Medicine

How to Cite This Article

APA Style:
James, C. (2025). GAN-Based privacy-preserving data synthesis in healthcare. International Journal of Engineering & Tech Development, 2(1), 1-7.

References

[1] Goodfellow, I., et al. (2014). Generative Adversarial Networks. Advances in Neural Information Processing Systems.

[2] Choi, E., et al. (2017). medGAN: Medical Data Generation Using Generative Adversarial Networks. Machine Learning for Healthcare.

[3] Xu, L., et al. (2019). PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees. ICLR.

[4] Beaulieu-Jones, B., et al. (2019). Privacy-Preserving Generative Deep Neural Networks for Clinical Data. Nature Scientific Reports.

[5] Esteban, C., et al. (2017). Real-Valued Medical Time Series Generation with Recurrent Conditional GANs. arXiv.

[6] Baowaly, M., et al. (2019). Synthesizing Electronic Health Records Using Improved GANs. Journal of Biomedical Informatics.

[7] Chlap, P., et al. (2021). A Review of Medical Imaging Applications of GANs. Journal of Medical Imaging.

[8] Park, N., et al. (2018). Data Synthesis Based on Generative Adversarial Networks. VLDB.

[9] Shokri, R., et al. (2017). Membership Inference Attacks Against Machine Learning Models. IEEE Symposium on Security and Privacy.

[10] El Emam, K., et al. (2011). A Systematic Review of Re-Identification Attacks on Health Data. PLoS ONE.

[11] Abay, N., et al. (2018). Privacy-Preserving Synthetic Data Generation. IEEE Security & Privacy.

[12] Goncalves, A., et al. (2020). Generation and Evaluation of Synthetic Patient Data. BMC Medical Research Methodology.

[13] Torkzadehmahani, R., et al. (2020). Fairness and Privacy in Synthetic Data. IEEE Transactions on Knowledge and Data Engineering.

[14] Johnson, A. E. W., et al. (2016). MIMIC-III, a Freely Accessible Critical Care Database. Scientific Data.

[15] Kuo, T. T., et al. (2022). Regulatory and Ethical Considerations for Synthetic Health Data. NPJ Digital Medicine.