LViT: Language meets Vision Transformer in Medical Image Segmentation

要約

ディープラーニングは、医療画像のセグメンテーションやその他の側面で広く使用されています。
ただし、既存の医療画像セグメンテーションモデルのパフォーマンスは、データ注釈のコストが高く、十分な数の高品質データを取得するという課題によって制限されてきました。
この制限を克服するために、新しい視覚言語医療画像セグメンテーションモデルLViT（言語とビジョントランスフォーマーの出会い）を提案します。
私たちのモデルでは、画像データの品質の不足を補うために医療テキスト注釈が導入されています。
さらに、テキスト情報は、ある程度の疑似ラベルの生成をガイドし、半教師あり学習での疑似ラベルの品質をさらに保証することができます。
また、画像の局所的な特徴を維持するために、LViTの半教師ありバージョンとピクセルレベルの注意モジュール（PLAM）を拡張するのに役立つ、指数疑似ラベル反復メカニズム（EPI）を提案します。
私たちのモデルでは、LV（Language-Vision）損失は、テキスト情報を直接使用してラベルのない画像のトレーニングを監視するように設計されています。
LViTのパフォーマンスを検証するために、病理画像、X線などを含むマルチモーダル医療セグメンテーションデータセット（画像+テキスト）を構築します。
実験結果は、提案されたLViTが完全および半教師あり条件の両方でより良いセグメンテーションパフォーマンスを持っていることを示しています。
コードとデータセットはhttps://github.com/HUANGLIZI/LViTで入手できます。

要約(オリジナル)

Deep learning has been widely used in medical image segmentation and other aspects. However, the performance of existing medical image segmentation models has been limited by the challenge of obtaining sufficient number of high-quality data with the high cost of data annotation. To overcome the limitation, we propose a new vision-language medical image segmentation model LViT (Language meets Vision Transformer). In our model, medical text annotation is introduced to compensate for the quality deficiency in image data. In addition, the text information can guide the generation of pseudo labels to a certain extent and further guarantee the quality of pseudo labels in semi-supervised learning. We also propose the Exponential Pseudo label Iteration mechanism (EPI) to help extend the semi-supervised version of LViT and the Pixel-Level Attention Module (PLAM) to preserve local features of images. In our model, LV (Language-Vision) loss is designed to supervise the training of unlabeled images using text information directly. To validate the performance of LViT, we construct multimodal medical segmentation datasets (image + text) containing pathological images, X-rays,etc. Experimental results show that our proposed LViT has better segmentation performance in both fully and semi-supervised conditions. Code and datasets are available at https://github.com/HUANGLIZI/LViT.

arxiv情報

著者	Zihan Li,Yunxiang Li,Qingde Li,You Zhang,Puyang Wang,Dazhou Guo,Le Lu,Dakai Jin,Qingqi Hong
発行日	2022-06-29 15:36:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LViT: Language meets Vision Transformer in Medical Image Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー