Sparse Modular Activation for Efficient Sequence Modeling

要約

線形状態空間モデル (SSM) は、再帰構造の効率的なエンコードにより、さまざまなシーケンスモデリングタスクで優れたパフォーマンスを実証しています。
ただし、言語モデリングや機械翻訳などのより包括的なタスクでは、自己注意ベースのモデルは依然として SSM よりも優れたパフォーマンスを発揮します。
SSM とセルフアテンションの両方を採用したハイブリッドモデルは、一般に有望なパフォーマンスを示しますが、現在のアプローチではアテンションモジュールを入力シーケンス内のすべての要素に静的かつ均一に適用するため、品質と効率のトレードオフが最適化されていません。
この研究では、ニューラルネットワークが微分可能な方法でシーケンス要素のサブモジュールをまばらかつ動的にアクティブ化できるようにする一般的なメカニズムであるスパースモジュラーアクティベーション (SMA) を紹介します。
SMA は、各要素がアクティブ化されていないサブモジュールをスキップできるようにすることで、シーケンスモデリングのトレーニング段階と推論段階の両方で計算とメモリの消費を削減します。
SMA の具体的なインスタンス化として、SMA を使用して SSM から学習した状態表現に基づいてゲートアテンションユニット (GAU) をまばらにアクティブにする新しいニューラルアーキテクチャ SeqBoat を設計します。
アクティブ化された入力に対してローカルアテンションのみを実行するように GAU を制約することで、SeqBoat は理論的に無限のアテンションスパンで線形推論の複雑さを実現し、チャンキングベースのモデルよりも大幅に優れた品質効率のトレードオフを提供できます。
言語モデリング、音声分類、長距離アリーナを含む幅広いタスクに関する実験により、SeqBoat は線形複雑性を持つハイブリッドモデルに新しい最先端の結果をもたらし、各タスクに必要な注意量を明らかにします。
スパースな活性化パターンを学習しました。

要約(オリジナル)

Linear State Space Models (SSMs) have demonstrated strong performance in a variety of sequence modeling tasks due to their efficient encoding of the recurrent structure. However, in more comprehensive tasks like language modeling and machine translation, self-attention-based models still outperform SSMs. Hybrid models employing both SSM and self-attention generally show promising performance, but current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs. In this work, we introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely and dynamically activate sub-modules for sequence elements in a differentiable manner. Through allowing each element to skip non-activated sub-modules, SMA reduces computation and memory consumption at both training and inference stages of sequence modeling. As a specific instantiation of SMA, we design a novel neural architecture, SeqBoat, which employs SMA to sparsely activate a Gated Attention Unit (GAU) based on the state representations learned from an SSM. By constraining the GAU to only conduct local attention on the activated inputs, SeqBoat can achieve linear inference complexity with theoretically infinite attention span, and provide substantially better quality-efficiency trade-off than the chunking-based models. With experiments on a wide range of tasks, including language modeling, speech classification and long-range arena, SeqBoat brings new state-of-the-art results among hybrid models with linear complexity and reveals the amount of attention needed for each task through the learned sparse activation patterns.

arxiv情報

著者	Liliang Ren,Yang Liu,Shuohang Wang,Yichong Xu,Chenguang Zhu,ChengXiang Zhai
発行日	2023-06-19 23:10:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Sparse Modular Activation for Efficient Sequence Modeling

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー