Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes

要約

患者の臨床ノートを扱うために調整された大規模言語モデルの開発は、厳しいプライバシー規制のために、これらのノートの限られたアクセシビリティとユーザビリティによってしばしば妨げられる。このような課題に対処するため、我々はまず、生物医学文献から抽出した一般に入手可能な症例報告を用いて、合成大規模臨床ノートを作成する。次に、これらの合成メモを使用して、臨床に特化した大規模言語モデルAsclepiusを学習します。Asclepiusは合成データで学習されるが、実際の臨床ノートを用いて評価することで、実世界のアプリケーションにおける潜在的な性能を評価する。Asclepiusは、GPT-3.5-turboや他のオープンソースの代替モデルを含む、いくつかの他の大規模言語モデルに対してベンチマークを行います。さらに、合成メモを用いた我々のアプローチを検証するために、Asclepiusと実際の臨床メモで学習させたモデルを比較した。我々の結果は、高性能な臨床言語モデルを構築する際に、合成臨床メモが実際の臨床メモの代用品として有効であることを説得力を持って示している。この結論は、GPT-4と医療専門家の両方による詳細な評価によって裏付けられています。Asclepiusの開発に使用された重み、コード、データを含むすべてのリソースは、将来の研究のために一般公開されています。

要約(オリジナル)

The development of large language models tailored for handling patients’ clinical notes is often hindered by the limited accessibility and usability of these notes due to strict privacy regulations. To address these challenges, we first create synthetic large-scale clinical notes using publicly available case reports extracted from biomedical literature. We then use these synthetic notes to train our specialized clinical large language model, Asclepius. While Asclepius is trained on synthetic data, we assess its potential performance in real-world applications by evaluating it using real clinical notes. We benchmark Asclepius against several other large language models, including GPT-3.5-turbo and other open-source alternatives. To further validate our approach using synthetic notes, we also compare Asclepius with its variants trained on real clinical notes. Our findings convincingly demonstrate that synthetic clinical notes can serve as viable substitutes for real ones when constructing high-performing clinical language models. This conclusion is supported by detailed evaluations conducted by both GPT-4 and medical professionals. All resources including weights, codes, and data used in the development of Asclepius are made publicly accessible for future research.

arxiv情報

著者	Sunjun Kweon,Junu Kim,Jiyoun Kim,Sujeong Im,Eunbyeol Cho,Seongsu Bae,Jungwoo Oh,Gyubok Lee,Jong Hak Moon,Seng Chan You,Seungjin Baek,Chang Hoon Han,Yoon Bin Jung,Yohan Jo,Edward Choi
発行日	2023-09-01 04:01:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー