Cross-Modal Conditioned Reconstruction for Language-guided Medical Image Segmentation

要約

近年の発展により、医学的な視覚的意味をより深く理解するための学習モデルを強化するためのテキスト情報の可能性が強調されている。しかし、言語による医療画像セグメンテーションは、依然として困難な問題に直面している。これまでの研究では、テキスト情報を埋め込むために、暗黙的であいまいなアーキテクチャを採用している。このため、セグメンテーション結果は、言語によって表現されるセマンティクスと矛盾し、時には大きく乖離することさえある。このため、我々は、クロスモーダルな相互作用を明示的に捉える、新しいクロスモーダル条件付き再構成（RecLMIS）を提案する。条件付き相互作用を導入し、注目するパッチと単語を適応的に予測する。その後、それらを相互再構成のための条件因子として利用し、医療メモに記述された領域と整合させる。広範な実験により、我々のRecLMISの優位性が実証され、一般公開されているMosMedData+データセットではLViTを3.74%mIoU上回り、我々のQATA-CoV19データセットではクロスドメインテストで平均1.89%mIoUの増加を達成した。同時に、パラメータ数を20.2%削減し、計算負荷を55.5%削減した。コードはhttps://github.com/ShashankHuang/RecLMIS。

要約(オリジナル)

Recent developments underscore the potential of textual information in enhancing learning models for a deeper understanding of medical visual semantics. However, language-guided medical image segmentation still faces a challenging issue. Previous works employ implicit and ambiguous architectures to embed textual information. This leads to segmentation results that are inconsistent with the semantics represented by the language, sometimes even diverging significantly. To this end, we propose a novel cross-modal conditioned Reconstruction for Language-guided Medical Image Segmentation (RecLMIS) to explicitly capture cross-modal interactions, which assumes that well-aligned medical visual features and medical notes can effectively reconstruct each other. We introduce conditioned interaction to adaptively predict patches and words of interest. Subsequently, they are utilized as conditioning factors for mutual reconstruction to align with regions described in the medical notes. Extensive experiments demonstrate the superiority of our RecLMIS, surpassing LViT by 3.74% mIoU on the publicly available MosMedData+ dataset and achieving an average increase of 1.89% mIoU for cross-domain tests on our QATA-CoV19 dataset. Simultaneously, we achieve a relative reduction of 20.2% in parameter count and a 55.5% decrease in computational load. The code will be available at https://github.com/ShashankHuang/RecLMIS.

arxiv情報

著者	Xiaoshuang Huang,Hongxiang Li,Meng Cao,Long Chen,Chenyu You,Dong An
発行日	2024-04-03 16:23:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Cross-Modal Conditioned Reconstruction for Language-guided Medical Image Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー