StableMamba: Distillation-free Scaling of Large SSMs for Images and Videos

要約

S4で例示された状態空間モデル（SSM）は、状態空間技術を深い学習に統合することにより、新しいコンテキストモデリング方法を導入しました。
ただし、データに依存しないマトリックスのため、グローバルコンテキストモデリングに苦労しています。
MAMBAモデルは、特に長いシーケンスに対して、S6選択的スキャンアルゴリズムを介してデータ依存バリアントを使用してこれに対処し、コンテキストモデリングを強化しました。
ただし、MAMBAベースのアーキテクチャは、視力アプリケーションの大きな制限であるパラメーターの数に関して拡張することが困難です。
このペーパーでは、知識の蒸留などの追加手法を必要とせずに、画像分類とアクション認識のための大規模SSMのスケーラビリティの問題について説明します。
Mambaベースと注意ベースのモデルの明確な特性を分析し、スケーラビリティ、堅牢性、パフォーマンスを向上させるMamba-Attention Interleavedアーキテクチャを提案します。
安定した効率的なインターリーブアーキテクチャが、画像やビデオのMAMBAベースのアーキテクチャのスケーラビリティの問題を解決し、JPEG圧縮などの一般的なアーティファクトに対する堅牢性を高めることを実証します。
ImagENET-1K、Kinetics-400、およびSomething-hind-v2ベンチマークに関する徹底的な評価は、最新のMambaベースのアーキテクチャの精度が最大$+1.7ドルであることを示しています。

要約(オリジナル)

State-space models (SSMs), exemplified by S4, have introduced a novel context modeling method by integrating state-space techniques into deep learning. However, they struggle with global context modeling due to their data-independent matrices. The Mamba model addressed this with data-dependent variants via the S6 selective-scan algorithm, enhancing context modeling, especially for long sequences. However, Mamba-based architectures are difficult to scale with respect to the number of parameters, which is a major limitation for vision applications. This paper addresses the scalability issue of large SSMs for image classification and action recognition without requiring additional techniques like knowledge distillation. We analyze the distinct characteristics of Mamba-based and Attention-based models, proposing a Mamba-Attention interleaved architecture that enhances scalability, robustness, and performance. We demonstrate that the stable and efficient interleaved architecture resolves the scalability issue of Mamba-based architectures for images and videos and increases robustness to common artifacts like JPEG compression. Our thorough evaluation on the ImageNet-1K, Kinetics-400 and Something-Something-v2 benchmarks demonstrates that our approach improves the accuracy of state-of-the-art Mamba-based architectures by up to $+1.7$.

arxiv情報

著者	Hamid Suleman,Syed Talal Wasim,Muzammal Naseer,Juergen Gall
発行日	2025-03-27 16:45:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

StableMamba: Distillation-free Scaling of Large SSMs for Images and Videos

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー