SynEHRgy: Synthesizing Mixed-Type Structured Electronic Health Records using Decoder-Only Transformers

要約

合成電子医療記録 (EHR) の生成は、データの増強、プライバシー保護のデータ共有、機械学習モデルのトレーニングの改善に大きな可能性をもたらします。
私たちは、共変量、ICD コード、不規則にサンプリングされた時系列などのさまざまなデータタイプを含む、構造化された EHR データに合わせた新しいトークン化戦略を提案します。
GPT のようなデコーダーのみのトランスフォーマーモデルを使用して、高品質の合成 EHR の生成を実証します。
私たちのアプローチは MIMIC-III データセットを使用して評価され、生成されたデータの忠実性、実用性、プライバシーを最先端のモデルに対してベンチマークします。

要約(オリジナル)

Generating synthetic Electronic Health Records (EHRs) offers significant potential for data augmentation, privacy-preserving data sharing, and improving machine learning model training. We propose a novel tokenization strategy tailored for structured EHR data, which encompasses diverse data types such as covariates, ICD codes, and irregularly sampled time series. Using a GPT-like decoder-only transformer model, we demonstrate the generation of high-quality synthetic EHRs. Our approach is evaluated using the MIMIC-III dataset, and we benchmark the fidelity, utility, and privacy of the generated data against state-of-the-art models.

arxiv情報

著者	Hojjat Karami,David Atienza,Anisoara Ionescu
発行日	2024-11-20 16:11:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SynEHRgy: Synthesizing Mixed-Type Structured Electronic Health Records using Decoder-Only Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー