Gated Slot Attention for Efficient Linear-Time Sequence Modeling

要約

リニアアテンショントランスフォーマーとそのゲート型バリアントは、並列トレーニングと効率的な再帰推論を可能にすることで知られていますが、従来のトランスフォーマーと比較すると、リコール集中型のタスクでは依然として不十分であり、ゼロからトレーニングするために多大なリソースを必要とします。
このペーパーでは、Gated Linear Attendant (GLA) からインスピレーションを得たゲートメカニズムを組み込むことで、Bounded-memory-Control (ABC) によるアテンションを強化する Gated Slot Attendance (GSA) を紹介します。
基本的に、GSA は $\operatorname{softmax}$ を介してリンクされた 2 層の GLA で構成され、コンテキスト認識型メモリ読み取りと適応型忘却を利用して、コンパクトな再発状態サイズを維持しながらメモリ容量を向上させます。
この設計は、GLA のハードウェア効率の高いトレーニングアルゴリズムと状態サイズの削減を通じて、トレーニングと推論の両方の効率を大幅に向上させます。
さらに、$\operatorname{softmax}$ 操作を保持することは、「事前トレーニングされたトランスフォーマーを RNN に微調整する」(T2R) 設定に特に有益であり、最初から広範なトレーニングを行う必要性が減ります。
広範な実験により、コンテキスト内でのリコールが必要なシナリオや T2R 設定における GSA の優れたパフォーマンスが確認されています。

要約(オリジナル)

Linear attention Transformers and their gated variants, celebrated for enabling parallel training and efficient recurrent inference, still fall short in recall-intensive tasks compared to traditional Transformers and demand significant resources for training from scratch. This paper introduces Gated Slot Attention (GSA), which enhances Attention with Bounded-memory-Control (ABC) by incorporating a gating mechanism inspired by Gated Linear Attention (GLA). Essentially, GSA comprises a two-layer GLA linked via $\operatorname{softmax}$, utilizing context-aware memory reading and adaptive forgetting to improve memory capacity while maintaining compact recurrent state size. This design greatly enhances both training and inference efficiency through GLA’s hardware-efficient training algorithm and reduced state size. Additionally, retaining the $\operatorname{softmax}$ operation is particularly beneficial in ‘finetuning pretrained Transformers to RNNs’ (T2R) settings, reducing the need for extensive training from scratch. Extensive experiments confirm GSA’s superior performance in scenarios requiring in-context recall and in T2R settings.

arxiv情報

著者	Yu Zhang,Songlin Yang,Ruijie Zhu,Yue Zhang,Leyang Cui,Yiqiao Wang,Bolun Wang,Freda Shi,Bailin Wang,Wei Bi,Peng Zhou,Guohong Fu
発行日	2024-10-31 13:54:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Gated Slot Attention for Efficient Linear-Time Sequence Modeling

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー