MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation

要約

大規模言語モデル（Large Language Models: LLM）は、テキストデータにおける汎用性の高さで知られているが、正確な画像診断のために重要なタスクである医療画像のセグメンテーションを強化する可能性について、ますます研究が進んでいる。本研究では、事前に訓練されたLLM変換ブロックを統合することで、医用画像セグメンテーションのためのVision Transformers (ViT)を強化することを探求する。ViTベースのモデルのエンコーダにフリーズしたLLM変換ブロックを組み込む我々のアプローチは、様々な医用画像モダリティにおけるセグメンテーション性能の大幅な改善につながる。大局的特徴学習と局所的特徴学習を組み合わせたハイブリッド注意メカニズムを提案し、異なるスケールの特徴を集約するためのマルチスケールフュージョンブロックを用いる。強化されたモデルは、平均Diceスコアが0.74から0.79に増加し、精度、精度、Jaccard指数が向上するなど、大幅な性能向上を示す。これらの結果は、医用画像セグメンテーションの改良におけるLLMベースの変換器の有効性を示し、モデルの精度と頑健性を大幅に向上させる可能性を強調している。ソースコードと我々の実装は以下から入手可能： https://bit.ly/3zf2CVs

要約(オリジナル)

Large Language Models (LLMs), known for their versatility in textual data, are increasingly being explored for their potential to enhance medical image segmentation, a crucial task for accurate diagnostic imaging. This study explores enhancing Vision Transformers (ViTs) for medical image segmentation by integrating pre-trained LLM transformer blocks. Our approach, which incorporates a frozen LLM transformer block into the encoder of a ViT-based model, leads to substantial improvements in segmentation performance across various medical imaging modalities. We propose a Hybrid Attention Mechanism that combines global and local feature learning with a Multi-Scale Fusion Block for aggregating features across different scales. The enhanced model shows significant performance gains, including an average Dice score increase from 0.74 to 0.79 and improvements in accuracy, precision, and the Jaccard Index. These results demonstrate the effectiveness of LLM-based transformers in refining medical image segmentation, highlighting their potential to significantly boost model accuracy and robustness. The source code and our implementation are available at: https://bit.ly/3zf2CVs

arxiv情報

著者	Gurucharan Marthi Krishna Kumar,Aman Chadha,Janine Mendola,Amir Shmuel
発行日	2024-10-03 14:50:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー