MCP-MedSAM: A Powerful Lightweight Medical Segment Anything Model Trained with a Single GPU in Just One Day

要約

医療画像のセグメンテーションには、解剖学的構造と病変の特定に焦点を当てた医療画像を意味のある地域に分割することが含まれます。
ヘルスケアには幅広いアプリケーションがあり、深い学習方法により、このプロセスの自動化において大きな進歩が可能になりました。
最近、セグメンテーションタスクの最初の基礎モデルであるセグメンテーションAnything Model（SAM）の導入により、研究者はさまざまなタスクのパフォーマンスを改善するために医療ドメインに適応するようになりました。
ただし、SAMの大きなモデルサイズと高いGPU要件は、医療領域でのスケーラビリティと開発を妨げています。
この作業では、優れたセグメンテーションパフォーマンスを提供しながら、1日以内に40GBのメモリを備えた単一のA100 GPUでトレーニング可能になるように設計された強力で軽量の医療SAMモデルであるMCP-Medsamを提案します。
モダリティと境界ボックス内の直接セグメンテーションターゲット情報の必要性との重要な内部違いを認識すると、2種類のプロンプト、モダリティプロンプトとコンテンツプロンプトを紹介します。
プロンプトエンコーダーを通過した後、埋め込み表現は、重要なトレーニングオーバーヘッドを追加せずに、より関連性のある情報を組み込むことにより、セグメンテーションパフォーマンスをさらに改善できます。
さらに、効果的なモダリティベースのデータサンプリング戦略を採用して、モダリティ間のデータの不均衡に対処し、すべてのモダリティでよりバランスの取れたパフォーマンスを確保します。
私たちの方法は、チャレンジリーダーボードのトップランクの方法と比較して、大規模なチャレンジデータセットを使用してトレーニングおよび評価されました。MCP-Medsamは優れたパフォーマンスを達成し、単一のGPUで1日のトレーニングを必要としました。
このコードは、\ textcolor {blue} {https://github.com/dong845/mcp-medsam}。}で公開されています。}

要約(オリジナル)

Medical image segmentation involves partitioning medical images into meaningful regions, with a focus on identifying anatomical structures and lesions. It has broad applications in healthcare, and deep learning methods have enabled significant advancements in automating this process. Recently, the introduction of the Segmentation Anything Model (SAM), the first foundation model for segmentation task, has prompted researchers to adapt it for the medical domain to improve performance across various tasks. However, SAM’s large model size and high GPU requirements hinder its scalability and development in the medical domain. In this work, we propose MCP-MedSAM, a powerful and lightweight medical SAM model designed to be trainable on a single A100 GPU with 40GB of memory within one day while delivering superior segmentation performance. Recognizing the significant internal differences between modalities and the need for direct segmentation target information within bounding boxes, we introduce two kinds of prompts: the modality prompt and the content prompt. After passing through the prompt encoder, their embedding representations can further improve the segmentation performance by incorporating more relevant information without adding significant training overhead. Additionally, we adopt an effective modality-based data sampling strategy to address data imbalance between modalities, ensuring more balanced performance across all modalities. Our method was trained and evaluated using a large-scale challenge dataset, compared to top-ranking methods on the challenge leaderboard, MCP-MedSAM achieved superior performance while requiring only one day of training on a single GPU. The code is publicly available at \textcolor{blue}{https://github.com/dong845/MCP-MedSAM}.}

arxiv情報

著者	Donghang Lyu,Ruochen Gao,Marius Staring
発行日	2025-05-14 12:51:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MCP-MedSAM: A Powerful Lightweight Medical Segment Anything Model Trained with a Single GPU in Just One Day

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー