Protect and Extend — Using GANs for Synthetic Data Generation of Time-Series Medical Records

要約

特にITベースの健康サービスなど、機密データを扱うサービスでは、高いQoE（Quality of Experience）と受容性を実現するために、プライベートなユーザーデータを保護することが最も重要である。匿名化技術がデータの再識別化を起こしやすいことが示されているのに対し、合成データ生成は比較的時間とリソースを消費せず、データ漏洩に対してより堅牢であるため、匿名化に取って代わりつつある。合成データセットの生成にはGenerative Adversarial Networks（GAN）が使用されており、特に差分プライバシー現象に準拠したGANフレームワークが使用されている。本研究では、合成データ生成のための最先端のGANベースのモデルを比較し、プライバシーの懸念なく配布可能な認知症患者の時系列合成医療記録を生成する。予測モデリング、自己相関、分布分析を用いて、生成されたデータのQoG（Quality of Generating）を評価する。潜在的なデータ漏洩リスクを判定するためにメンバーシップ推論攻撃を適用することにより、それぞれのモデルのプライバシー保護を評価する。我々の実験は、プライバシー保存GAN（PPGAN）モデルが、許容可能なレベルのQoGを維持しつつ、プライバシー保存に関して他のモデルよりも優れていることを示している。提示された結果は、将来、医療ユースケースのためのより良いデータ保護をサポートすることができる。

要約(オリジナル)

Preservation of private user data is of paramount importance for high Quality of Experience (QoE) and acceptability, particularly with services treating sensitive data, such as IT-based health services. Whereas anonymization techniques were shown to be prone to data re-identification, synthetic data generation has gradually replaced anonymization since it is relatively less time and resource-consuming and more robust to data leakage. Generative Adversarial Networks (GANs) have been used for generating synthetic datasets, especially GAN frameworks adhering to the differential privacy phenomena. This research compares state-of-the-art GAN-based models for synthetic data generation to generate time-series synthetic medical records of dementia patients which can be distributed without privacy concerns. Predictive modeling, autocorrelation, and distribution analysis are used to assess the Quality of Generating (QoG) of the generated data. The privacy preservation of the respective models is assessed by applying membership inference attacks to determine potential data leakage risks. Our experiments indicate the superiority of the privacy-preserving GAN (PPGAN) model over other models regarding privacy preservation while maintaining an acceptable level of QoG. The presented results can support better data protection for medical use cases in the future.

arxiv情報

著者	Navid Ashrafi,Vera Schmitt,Robert P. Spang,Sebastian Möller,Jan-Niklas Voigt-Antons
発行日	2024-03-01 11:46:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Protect and Extend — Using GANs for Synthetic Data Generation of Time-Series Medical Records

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー