Topo-VM-UNetV2: Encoding Topology into Vision Mamba UNet for Polyp Segmentation

要約

畳み込みニューラルネットワーク（CNN）とトランスベースのアーキテクチャは、ポリープセグメンテーションの2つの支配的なディープラーニングモデルです。
ただし、CNNは長距離依存関係をモデル化する能力が限られていますが、トランスは二次計算の複雑さを負います。
最近、マンバなどの状態空間モデルは、長距離相互作用を効果的にモデル化するだけでなく、線形計算の複雑さを維持するため、ポリープセグメンテーションの有望なアプローチとして認識されています。
ただし、Mambaベースのアーキテクチャは、トポロジーの特徴（接続されたコンポーネント、ループ、ボイドなど）をキャプチャするのに苦労しており、不正確な境界描写とポリープのセグメンテーションにつながります。
これらの制限に対処するために、TOPO-VM-UNETV2と呼ばれる新しいアプローチを提案します。これは、トポロジー特性をMAMBAベースの最先端のポリープセグメンテーションモデルVM-UNETV2にコードします。
私たちの方法は、ステージ1：VM-UNETV2を使用して、トレーニング画像とテスト画像の確率マップ（PMS）を生成するために使用され、トポロジの注意マップを計算するために使用されます。
具体的には、最初にPMSの永続性図を計算し、次に各トポロジー特徴の持続性値（つまり、死亡時間と出生時の違い）を出生場所に割り当てることにより、持続性スコアマップを生成します。最後に、シグモイド機能を使用して永続性スコアを注意力に変換します。
ステージ2：これらのトポロジの注意マップは、VM-UNETV2のセマンティクスと詳細注入（SDI）モジュールに統合されており、セグメンテーション結果を強化するためのトポロジ誘導セマンティクスとディテール注入（TOPO-SDI）モジュールを形成します。
5つの公開ポリープセグメンテーションデータセットに関する広範な実験は、提案された方法の有効性を示しています。
コードは公開されます。

要約(オリジナル)

Convolutional neural network (CNN) and Transformer-based architectures are two dominant deep learning models for polyp segmentation. However, CNNs have limited capability for modeling long-range dependencies, while Transformers incur quadratic computational complexity. Recently, State Space Models such as Mamba have been recognized as a promising approach for polyp segmentation because they not only model long-range interactions effectively but also maintain linear computational complexity. However, Mamba-based architectures still struggle to capture topological features (e.g., connected components, loops, voids), leading to inaccurate boundary delineation and polyp segmentation. To address these limitations, we propose a new approach called Topo-VM-UNetV2, which encodes topological features into the Mamba-based state-of-the-art polyp segmentation model, VM-UNetV2. Our method consists of two stages: Stage 1: VM-UNetV2 is used to generate probability maps (PMs) for the training and test images, which are then used to compute topology attention maps. Specifically, we first compute persistence diagrams of the PMs, then we generate persistence score maps by assigning persistence values (i.e., the difference between death and birth times) of each topological feature to its birth location, finally we transform persistence scores into attention weights using the sigmoid function. Stage 2: These topology attention maps are integrated into the semantics and detail infusion (SDI) module of VM-UNetV2 to form a topology-guided semantics and detail infusion (Topo-SDI) module for enhancing the segmentation results. Extensive experiments on five public polyp segmentation datasets demonstrate the effectiveness of our proposed method. The code will be made publicly available.

arxiv情報

著者	Diego Adame,Jose A. Nunez,Fabian Vazquez,Nayeli Gurrola,Huimin Li,Haoteng Tang,Bin Fu,Pengfei Gu
発行日	2025-05-09 17:41:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Topo-VM-UNetV2: Encoding Topology into Vision Mamba UNet for Polyp Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー