Loop Neural Networks for Parameter Sharing

要約

GPT のような大規模言語モデルの成功は、シーケンス内の次のトークンを効率的に予測する能力に起因すると考えられます。
ただし、これらのモデルは、予測するトークンの複雑さに関係なく、一定の計算量に依存しており、反復的な改良の能力が欠けています。
この論文では、モデルのサイズを増やすことなく、より長い計算時間を利用することでより優れたパフォーマンスを実現する、新しいループニューラルネットワークを紹介します。
私たちのアプローチでは、入力を複数回再検討し、残りの接続を使用してモデルのサブセットを繰り返しループすることで予測を改良します。
GPT-2 のバージョンとループモデルを比較する実験を通じてこの方法の有効性を実証し、同様のパラメーター数を維持しながら言語モデリングタスクのパフォーマンスが向上していることを示しています。
重要なのは、これらの改善は追加のトレーニングデータを必要とせずに達成されることです。

要約(オリジナル)

The success of large-scale language models like GPT can be attributed to their ability to efficiently predict the next token in a sequence. However, these models rely on constant computational effort regardless of the complexity of the token they are predicting, lacking the capacity for iterative refinement. In this paper, we introduce a novel Loop Neural Network, which achieves better performance by utilizing longer computational time without increasing the model size. Our approach revisits the input multiple times, refining the prediction by iteratively looping over a subset of the model with residual connections. We demonstrate the effectiveness of this method through experiments comparing versions of GPT-2 with our loop models, showing improved performance in language modeling tasks while maintaining similar parameter counts. Importantly, these improvements are achieved without the need for extra training data.

arxiv情報

著者	Kei-Sing Ng,Qingchen Wang
発行日	2024-11-08 15:00:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Loop Neural Networks for Parameter Sharing

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー