Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation

要約

マルチモーダルセマンティックセグメンテーションは、特に低照度環境や露出過度の環境などの悪条件下で、AI エージェントの認識とシーン理解を大幅に強化します。
従来の RGB と並行して熱や深度などの追加モダリティ (X モダリティ) を活用することで、補完的な情報が提供され、より堅牢で信頼性の高いセグメンテーションが可能になります。
この研究では、選択的構造化状態空間モデル Mamba を利用した、マルチモーダルセマンティックセグメンテーションのための Siamese Mamba ネットワークである Sigma を紹介します。
局所的な受容野が限られている CNN や、二次複雑性を犠牲にしてグローバルな受容野を提供するビジョントランスフォーマー (ViT) に依存する従来の方法とは異なり、私たちのモデルは線形複雑さでグローバルな受容野のカバレッジを実現します。
Siamese エンコーダーを採用し、Mamba 融合メカニズムを革新することで、さまざまなモダリティから重要な情報を効果的に選択します。
次に、モデルのチャネルごとのモデリング能力を強化するためにデコーダが開発されます。
私たちの手法である Sigma は、RGB 熱セグメンテーションタスクと RGB 深度セグメンテーションタスクの両方で厳密に評価され、その優位性が実証され、マルチモーダル知覚タスクにおける状態空間モデル (SSM) の適用に初めて成功しました。
コードは https://github.com/zifuwan/Sigma で入手できます。

要約(オリジナル)

Multi-modal semantic segmentation significantly enhances AI agents’ perception and scene understanding, especially under adverse conditions like low-light or overexposed environments. Leveraging additional modalities (X-modality) like thermal and depth alongside traditional RGB provides complementary information, enabling more robust and reliable segmentation. In this work, we introduce Sigma, a Siamese Mamba network for multi-modal semantic segmentation, utilizing the Selective Structured State Space Model, Mamba. Unlike conventional methods that rely on CNNs, with their limited local receptive fields, or Vision Transformers (ViTs), which offer global receptive fields at the cost of quadratic complexity, our model achieves global receptive fields coverage with linear complexity. By employing a Siamese encoder and innovating a Mamba fusion mechanism, we effectively select essential information from different modalities. A decoder is then developed to enhance the channel-wise modeling ability of the model. Our method, Sigma, is rigorously evaluated on both RGB-Thermal and RGB-Depth segmentation tasks, demonstrating its superiority and marking the first successful application of State Space Models (SSMs) in multi-modal perception tasks. Code is available at https://github.com/zifuwan/Sigma.

arxiv情報

著者	Zifu Wan,Yuhao Wang,Silong Yong,Pingping Zhang,Simon Stepputtis,Katia Sycara,Yaqi Xie
発行日	2024-04-05 17:59:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー