DeViDe: Faceted medical knowledge for improved medical vision-language pre-training

要約

胸部X線写真の視覚言語事前訓練は、主に対になったX線写真と放射線診断報告書を利用することで、大きな進歩を遂げてきた。しかし、既存のアプローチは、医学的知識を効果的に符号化する上でしばしば課題に直面する。放射線診断報告書は、現在の疾患症状に関する洞察を提供する一方で、医学的定義は（現代の手法で使用されるように）過度に抽象的である傾向があり、知識のギャップを生じさせる。この問題に対処するために、我々は、オープンウェブからのX線写真の記述を活用する、新しい変換器ベースの手法であるDeViDeを提案する。これらの記述は、X線写真における疾患の一般的な視覚的特徴を概説し、抽象的な定義や放射線学的レポートと組み合わせることで、知識の全体的なスナップショットを提供する。DeViDeには、知識を補強する視覚言語アライメントのための3つの重要な機能が組み込まれている：第一に、多様な情報源からの医学的知識を均質化するために、大規模な言語モデルベースのオーグメンテーションが採用されている。第二に、この知識を様々な粒度レベルで画像情報と整合させる。第三に、新しい投影レイヤーを提案し、マルチラベル設定で生じる複数の説明と各画像の位置合わせの複雑さを処理する。ゼロショット設定において、DeViDeは外部データセットにおいて完全教師ありモデルと同等の性能を発揮し、3つの大規模データセットにおいて最先端の結果を達成した。さらに、4つのダウンストリームタスクと6つのセグメンテーションタスクでDeViDeを微調整することで、多様な分布のデータに対して優れた性能を示す。

要約(オリジナル)

Vision-language pre-training for chest X-rays has made significant strides, primarily by utilizing paired radiographs and radiology reports. However, existing approaches often face challenges in encoding medical knowledge effectively. While radiology reports provide insights into the current disease manifestation, medical definitions (as used by contemporary methods) tend to be overly abstract, creating a gap in knowledge. To address this, we propose DeViDe, a novel transformer-based method that leverages radiographic descriptions from the open web. These descriptions outline general visual characteristics of diseases in radiographs, and when combined with abstract definitions and radiology reports, provide a holistic snapshot of knowledge. DeViDe incorporates three key features for knowledge-augmented vision language alignment: First, a large-language model-based augmentation is employed to homogenise medical knowledge from diverse sources. Second, this knowledge is aligned with image information at various levels of granularity. Third, a novel projection layer is proposed to handle the complexity of aligning each image with multiple descriptions arising in a multi-label setting. In zero-shot settings, DeViDe performs comparably to fully supervised models on external datasets and achieves state-of-the-art results on three large-scale datasets. Additionally, fine-tuning DeViDe on four downstream tasks and six segmentation tasks showcases its superior performance across data from diverse distributions.

arxiv情報

著者	Haozhe Luo,Ziyu Zhou,Corentin Royer,Anjany Sekuboyina,Bjoern Menze
発行日	2024-04-04 17:40:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

DeViDe: Faceted medical knowledge for improved medical vision-language pre-training

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー