Anatomical Structure-Guided Medical Vision-Language Pre-training

要約

視覚言語の事前トレーニングを通じた医療視覚表現の学習は、目覚ましい進歩を遂げています。
有望なパフォーマンスにもかかわらず、依然として課題に直面しています。つまり、局所的な位置合わせには解釈可能性と臨床的関連性が欠けており、画像とレポートのペアの内部および外部表現の学習が不十分です。
これらの問題に対処するために、解剖学的構造ガイド (ASG) フレームワークを提案します。
具体的には、生のレポートを 3 つの要素 <解剖学的領域、所見、存在> に解析し、各要素を監視として最大限に活用して表現学習を強化します。
解剖学的領域については、放射線科医と協力して、解剖学的領域と文の自動位置合わせパラダイムを設計し、それらをきめの細かい局所位置合わせを探索するための最小の意味単位とみなします。
発見と存在については、それらを画像タグとみなし、画像タグ認識デコーダを適用して画像の特徴を各サンプル内のそれぞれのタグに関連付け、対照学習用のソフトラベルを構築して、異なる画像レポートペアの意味的関連性を向上させます。
私たちは、5 つの公開ベンチマークを含む 2 つの下流タスクに関して提案された ASG フレームワークを評価します。
実験結果は、私たちの方法が最先端の方法よりも優れていることを示しています。

要約(オリジナル)

Learning medical visual representations through vision-language pre-training has reached remarkable progress. Despite the promising performance, it still faces challenges, i.e., local alignment lacks interpretability and clinical relevance, and the insufficient internal and external representation learning of image-report pairs. To address these issues, we propose an Anatomical Structure-Guided (ASG) framework. Specifically, we parse raw reports into triplets , and fully utilize each element as supervision to enhance representation learning. For anatomical region, we design an automatic anatomical region-sentence alignment paradigm in collaboration with radiologists, considering them as the minimum semantic units to explore fine-grained local alignment. For finding and existence, we regard them as image tags, applying an image-tag recognition decoder to associate image features with their respective tags within each sample and constructing soft labels for contrastive learning to improve the semantic association of different image-report pairs. We evaluate the proposed ASG framework on two downstream tasks, including five public benchmarks. Experimental results demonstrate that our method outperforms the state-of-the-art methods.

arxiv情報

著者	Qingqiu Li,Xiaohan Yan,Jilan Xu,Runtian Yuan,Yuejie Zhang,Rui Feng,Quanli Shen,Xiaobo Zhang,Shujun Wang
発行日	2024-03-14 11:29:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Anatomical Structure-Guided Medical Vision-Language Pre-training

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー