TG-LMM: Enhancing Medical Image Segmentation Accuracy through Text-Guided Large Multi-Modal Model

要約

我々は、臓器のテキストによる説明を活用して医療画像のセグメンテーション精度を高める新しいアプローチである TG-LMM (Text-Guided Large Multi-Modal Model) を提案します。
既存の医療画像セグメンテーション方法は、いくつかの課題に直面しています。現在の医療自動セグメンテーションモデルは、臓器の位置の説明などの事前知識を効果的に利用していません。
以前のテキストビジュアルモデルは、セグメンテーションの精度を向上させることよりも、ターゲットを識別することに焦点を当てていました。
以前のモデルは、事前の知識を使用して精度を向上させようとしますが、事前トレーニングされたモデルは組み込まれていません。
これらの問題に対処するために、TG-LMM は事前の知識、特に臓器の空間的位置に関する専門家の説明をセグメンテーションプロセスに統合します。
私たちのモデルは、事前トレーニングされた画像およびテキストエンコーダーを利用して、トレーニングパラメーターの数を減らし、トレーニングプロセスを高速化します。
さらに、2 つのデータ形式を完全に統合するために、包括的な画像とテキストの情報融合構造を設計しました。
私たちは、人体のさまざまな部分のセグメンテーションを含む 3 つの信頼できる医療画像データセットで TG-LMM を評価しました。
私たちの方法は、MedSAM、SAM、nnUnet などの既存のアプローチと比較して優れたパフォーマンスを実証しました。

要約(オリジナル)

We propose TG-LMM (Text-Guided Large Multi-Modal Model), a novel approach that leverages textual descriptions of organs to enhance segmentation accuracy in medical images. Existing medical image segmentation methods face several challenges: current medical automatic segmentation models do not effectively utilize prior knowledge, such as descriptions of organ locations; previous text-visual models focus on identifying the target rather than improving the segmentation accuracy; prior models attempt to use prior knowledge to enhance accuracy but do not incorporate pre-trained models. To address these issues, TG-LMM integrates prior knowledge, specifically expert descriptions of the spatial locations of organs, into the segmentation process. Our model utilizes pre-trained image and text encoders to reduce the number of training parameters and accelerate the training process. Additionally, we designed a comprehensive image-text information fusion structure to ensure thorough integration of the two modalities of data. We evaluated TG-LMM on three authoritative medical image datasets, encompassing the segmentation of various parts of the human body. Our method demonstrated superior performance compared to existing approaches, such as MedSAM, SAM and nnUnet.

arxiv情報

著者	Yihao Zhao,Enhao Zhong,Cuiyun Yuan,Yang Li,Man Zhao,Chunxia Li,Jun Hu,Chenbin Liu
発行日	2024-09-05 11:01:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

TG-LMM: Enhancing Medical Image Segmentation Accuracy through Text-Guided Large Multi-Modal Model

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー