SPMamba: State-space model is all you need in speech separation

要約

音声分離では、CNN ベースのモデルと Transformer ベースのモデルの両方が堅牢な分離機能を実証しており、研究コミュニティ内で大きな注目を集めています。
ただし、CNN ベースの方法では、長いシーケンスのオーディオのモデリング機能が制限されているため、最適な分離パフォーマンスが得られません。
逆に、Transformer ベースの手法は、計算の複雑さが高いため、実際の応用には制限があります。
特に、コンピュータビジョンの分野では、Mamba ベースの手法がその驚異的なパフォーマンスと計算要件の削減で高く評価されています。
本稿では、状態空間モデルを用いた音声分離ネットワークアーキテクチャ、すなわちSPMambaを提案する。
私たちは TF-GridNet モデルを基礎フレームワークとして採用し、その Transformer コンポーネントを双方向 Mamba モジュールで置き換えて、より広範囲のコンテキスト情報を取得することを目指しています。
私たちの実験結果は、Mamba ベースのモデルのパフォーマンス面における重要な役割を明らかにしています。
SPMamba は、Librispeech 上に構築されたデータセット内の既存の分離モデルよりも大きな利点を持つ優れたパフォーマンスを示します。
特に、SPMamba は分離品質の大幅な向上を実現し、TF-GridNet と比較して SI-SNRi が 2.42 dB 向上しています。
SPMamba のソースコードは、 https://github.com/JusperLee/SPMamba で公開されています。

要約(オリジナル)

In speech separation, both CNN- and Transformer-based models have demonstrated robust separation capabilities, garnering significant attention within the research community. However, CNN-based methods have limited modelling capability for long-sequence audio, leading to suboptimal separation performance. Conversely, Transformer-based methods are limited in practical applications due to their high computational complexity. Notably, within computer vision, Mamba-based methods have been celebrated for their formidable performance and reduced computational requirements. In this paper, we propose a network architecture for speech separation using a state-space model, namely SPMamba. We adopt the TF-GridNet model as the foundational framework and substitute its Transformer component with a bidirectional Mamba module, aiming to capture a broader range of contextual information. Our experimental results reveal an important role in the performance aspects of Mamba-based models. SPMamba demonstrates superior performance with a significant advantage over existing separation models in a dataset built on Librispeech. Notably, SPMamba achieves a substantial improvement in separation quality, with a 2.42 dB enhancement in SI-SNRi compared to the TF-GridNet. The source code for SPMamba is publicly accessible at https://github.com/JusperLee/SPMamba .

arxiv情報

著者	Kai Li,Guo Chen
発行日	2024-04-02 16:04:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SPMamba: State-space model is all you need in speech separation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー