RWKV: Reinventing RNNs for the Transformer Era

要約

トランスフォーマーは、ほぼすべての自然言語処理 (NLP) タスクに革命をもたらしましたが、シーケンスの長さに応じて二次関数的にスケールされるメモリと計算の複雑さに悩まされています。
対照的に、リカレントニューラルネットワーク (RNN) はメモリと計算要件において線形スケーリングを示しますが、並列化とスケーラビリティに制限があるため、Transformer と同じパフォーマンスに匹敵するのは困難です。
我々は、変換器の効率的な並列化トレーニングと RNN の効率的な推論を組み合わせた、新しいモデルアーキテクチャである Receptance Weighted Key Value (RWKV) を提案します。
私たちのアプローチでは、線形アテンションメカニズムを活用し、モデルを Transformer または RNN として定式化できるため、トレーニング中に計算を並列化し、推論中に一定の計算量とメモリの複雑さを維持できます。
私たちはモデルを 140 億パラメータにまで拡張し、これまでにトレーニングされた高密度 RNN としては群を抜いて最大であり、RWKV が同様のサイズの Transformer と同等のパフォーマンスを発揮することを発見しました。これは、将来の研究でこのアーキテクチャを活用してより効率的なモデルを作成できることを示唆しています。
この研究は、シーケンス処理タスクにおける計算効率とモデルのパフォーマンスの間のトレードオフを調整するための重要な一歩を示しています。

要約(オリジナル)

Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scalability. We propose a novel model architecture, Receptance Weighted Key Value (RWKV), that combines the efficient parallelizable training of transformers with the efficient inference of RNNs. Our approach leverages a linear attention mechanism and allows us to formulate the model as either a Transformer or an RNN, thus parallelizing computations during training and maintains constant computational and memory complexity during inference. We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers, suggesting future work can leverage this architecture to create more efficient models. This work presents a significant step towards reconciling trade-offs between computational efficiency and model performance in sequence processing tasks.

arxiv情報

著者	Bo Peng,Eric Alcaide,Quentin Anthony,Alon Albalak,Samuel Arcadinho,Stella Biderman,Huanqi Cao,Xin Cheng,Michael Chung,Matteo Grella,Kranthi Kiran GV,Xuzheng He,Haowen Hou,Jiaju Lin,Przemyslaw Kazienko,Jan Kocon,Jiaming Kong,Bartlomiej Koptyra,Hayden Lau,Krishna Sri Ipsit Mantri,Ferdinand Mom,Atsushi Saito,Guangyu Song,Xiangru Tang,Bolun Wang,Johan S. Wind,Stanislaw Wozniak,Ruichong Zhang,Zhenyuan Zhang,Qihang Zhao,Peng Zhou,Qinghua Zhou,Jian Zhu,Rui-Jie Zhu
発行日	2023-12-11 03:58:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

RWKV: Reinventing RNNs for the Transformer Era

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー