Vivim: a Video Vision Mamba for Medical Video Object Segmentation

要約

従来の畳み込みニューラルネットワークの受容野は限られていますが、トランスフォーマーベースのネットワークは、計算の複雑さの観点から長期的な依存関係を構築するのが平凡です。
このようなボトルネックは、ビデオ分析タスクで長いシーケンスを処理するときに重大な課題を引き起こします。
つい最近、Mamba で有名な、効率的なハードウェアを意識した設計を備えた状態空間モデル (SSM) が、ロングシーケンスモデリングで目覚ましい成果を示し、多くの視覚タスクでのディープニューラルネットワークの開発を促進しました。
ビデオフレーム内の利用可能な動的キューをより適切にキャプチャするために、この論文では、医療ビデオオブジェクトセグメンテーションタスク用の \textbf{Vivim} と呼ばれる汎用の Video Vision Mamba ベースのフレームワークを紹介します。
当社の Vivim は、設計された Temporal Mamba ブロックにより、長期の時空間表現をさまざまなスケールのシーケンスに効果的に圧縮できます。
また、医療画像内のあいまいな病変に対する Vivim の識別能力を強化するために、境界を意識した制約も導入します。
超音波ビデオでの甲状腺のセグメンテーションと結腸内視鏡ビデオでのポリープのセグメンテーションに関する広範な実験により、当社の Vivim の有効性と効率が既存の方法よりも優れていることが実証されました。
コードは https://github.com/scott-yjyang/Vivim から入手できます。

要約(オリジナル)

Traditional convolutional neural networks have a limited receptive field while transformer-based networks are mediocre in constructing long-term dependency from the perspective of computational complexity. Such the bottleneck poses a significant challenge when processing long sequences in video analysis tasks. Very recently, the state space models (SSMs) with efficient hardware-aware designs, famous by Mamba, have exhibited impressive achievements in long sequence modeling, which facilitates the development of deep neural networks on many vision tasks. To better capture available dynamic cues in video frames, this paper presents a generic Video Vision Mamba-based framework, dubbed as \textbf{Vivim}, for medical video object segmentation tasks. Our Vivim can effectively compress the long-term spatiotemporal representation into sequences at varying scales by our designed Temporal Mamba Block. We also introduce a boundary-aware constraint to enhance the discriminative ability of Vivim on ambiguous lesions in medical images. Extensive experiments on thyroid segmentation in ultrasound videos and polyp segmentation in colonoscopy videos demonstrate the effectiveness and efficiency of our Vivim, superior to existing methods. The code is available at: https://github.com/scott-yjyang/Vivim.

arxiv情報

著者	Yijun Yang,Zhaohu Xing,Chunwang Huang,Lei Zhu
発行日	2024-03-12 14:45:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Vivim: a Video Vision Mamba for Medical Video Object Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー