MobileViM: A Light-weight and Dimension-independent Vision Mamba for 3D Medical Image Analysis

要約

3次元（3D）医療画像の効率的な評価は、ヘルスケアにおける診断慣行および治療慣行に不可欠です。
近年、深い学習とコンピュータービジョンを適用して、医療画像を分析および解釈することにかなりの摂取が見られました。
畳み込みニューラルネットワーク（CNNS）や視覚変圧器（VIT）などの従来のアプローチは、重要な計算上の課題に直面し、建築の進歩の必要性を促します。
最近の努力により、従来のCNNまたはVITの代替ソリューションとして「Mamba」モデルのような新しいアーキテクチャが導入されました。
MAMBAモデルは、計算要求が低い1次元データの線形処理に優れています。
ただし、3D医療画像分析のMAMBAの可能性は未定であり、次元が増加するにつれて重大な計算上の課題に直面する可能性があります。
この原稿は、3D医療画像の効率的なセグメンテーションのための合理化されたアーキテクチャであるMobileVimを示しています。
MobileVimネットワークでは、Vision-Mambaベースのフレームワークに組み込まれる新しい次元に依存しないメカニズムと二重方向のトラバー状アプローチを発明します。
また、MobileVimは、さまざまな医療イメージングモダリティの効率と精度を向上させるためのクロススケールブリッジング手法を備えています。
これらの機能強化により、MobileVimは単一のグラフィックス処理ユニット（つまり、NVIDIA RTX 4090）で1秒あたり90フレーム（FPS）を超えるセグメンテーション速度を達成します。
このパフォーマンスは、同じ計算リソースで3D画像を処理するための最先端のディープラーニングモデルよりも24 fpsを超える速さです。
さらに、実験的評価は、MobileVimが優れたパフォーマンスを提供することを示しています。サイコロ類似性スコアは、それぞれ92.72％、86.69％、80.46％、および77.43％に達します。

要約(オリジナル)

Efficient evaluation of three-dimensional (3D) medical images is crucial for diagnostic and therapeutic practices in healthcare. Recent years have seen a substantial uptake in applying deep learning and computer vision to analyse and interpret medical images. Traditional approaches, such as convolutional neural networks (CNNs) and vision transformers (ViTs), face significant computational challenges, prompting the need for architectural advancements. Recent efforts have led to the introduction of novel architectures like the “Mamba” model as alternative solutions to traditional CNNs or ViTs. The Mamba model excels in the linear processing of one-dimensional data with low computational demands. However, Mamba’s potential for 3D medical image analysis remains underexplored and could face significant computational challenges as the dimension increases. This manuscript presents MobileViM, a streamlined architecture for efficient segmentation of 3D medical images. In the MobileViM network, we invent a new dimension-independent mechanism and a dual-direction traversing approach to incorporate with a vision-Mamba-based framework. MobileViM also features a cross-scale bridging technique to improve efficiency and accuracy across various medical imaging modalities. With these enhancements, MobileViM achieves segmentation speeds exceeding 90 frames per second (FPS) on a single graphics processing unit (i.e., NVIDIA RTX 4090). This performance is over 24 FPS faster than the state-of-the-art deep learning models for processing 3D images with the same computational resources. In addition, experimental evaluations demonstrate that MobileViM delivers superior performance, with Dice similarity scores reaching 92.72%, 86.69%, 80.46%, and 77.43% for PENGWIN, BraTS2024, ATLAS, and Toothfairy2 datasets, respectively, which significantly surpasses existing models.

arxiv情報

著者	Wei Dai,Jun Liu
発行日	2025-03-06 14:27:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MobileViM: A Light-weight and Dimension-independent Vision Mamba for 3D Medical Image Analysis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー