A Transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics

要約

診断プロセス中、臨床医は主訴、医療画像、臨床検査結果などの複合情報を活用します。
診断を支援するための深層学習モデルはまだこの要件を満たしていません。
ここでは、統合された方法でマルチモーダル入力を処理する臨床診断支援として、Transformer ベースの表現学習モデルを報告します。
このモデルは、モダリティ固有の機能を学習するのではなく、埋め込みレイヤーを使用して画像、非構造化テキストと構造化テキストを視覚トークンとテキストトークンに変換し、モーダル内およびモーダル間の注意を備えた双方向ブロックを使用して、X線写真、非構造化主訴、臨床症状の全体的な表現を学習します。
病歴、臨床検査結果などの構造化された臨床情報、患者の人口統計情報など。
統合モデルは、肺疾患の特定（それぞれ 12% および 9%）および COVID-19 患者の有害な臨床転帰の予測（29%）において、画像のみのモデルおよび非統合集学的診断モデルを上回りました。
それぞれ7%）。
統合されたマルチモーダル Transformer ベースのモデルを活用すると、患者のトリアージを合理化し、臨床意思決定プロセスを促進することができます。

要約(オリジナル)

During the diagnostic process, clinicians leverage multimodal information, such as chief complaints, medical images, and laboratory-test results. Deep-learning models for aiding diagnosis have yet to meet this requirement. Here we report a Transformer-based representation-learning model as a clinical diagnostic aid that processes multimodal input in a unified manner. Rather than learning modality-specific features, the model uses embedding layers to convert images and unstructured and structured text into visual tokens and text tokens, and bidirectional blocks with intramodal and intermodal attention to learn a holistic representation of radiographs, the unstructured chief complaint and clinical history, structured clinical information such as laboratory-test results and patient demographic information. The unified model outperformed an image-only model and non-unified multimodal diagnosis models in the identification of pulmonary diseases (by 12% and 9%, respectively) and in the prediction of adverse clinical outcomes in patients with COVID-19 (by 29% and 7%, respectively). Leveraging unified multimodal Transformer-based models may help streamline triage of patients and facilitate the clinical decision process.

arxiv情報

著者	Hong-Yu Zhou,Yizhou Yu,Chengdi Wang,Shu Zhang,Yuanxu Gao,Jia Pan,Jun Shao,Guangming Lu,Kang Zhang,Weimin Li
発行日	2023-06-01 16:23:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー