In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss

要約

この論文では、生成変換モデルを使用して長い文書を処理するという課題に取り組みます。
さまざまなアプローチを評価するために、広範なテキスト内の分散されたファクトを抽出および処理するモデルの機能を評価するように設計された新しいベンチマークである BABILong を紹介します。
GPT-4 と RAG のベンチマークを含む私たちの評価では、一般的な手法が $10^4$ 要素までのシーケンスに対してのみ有効であることが明らかになりました。
対照的に、リカレントメモリの拡張を使用して GPT-2 を微調整すると、最大 $10^7$ 要素を含むタスクを処理できるようになります。
この成果は、これまでのオープンニューラルネットワークモデルで処理された入力としてはこれまでで最も長いものであり、長いシーケンスの処理能力が大幅に向上したことを示しており、大幅な進歩を示しています。

要約(オリジナル)

This paper addresses the challenge of processing long documents using generative transformer models. To evaluate different approaches, we introduce BABILong, a new benchmark designed to assess model capabilities in extracting and processing distributed facts within extensive texts. Our evaluation, which includes benchmarks for GPT-4 and RAG, reveals that common methods are effective only for sequences up to $10^4$ elements. In contrast, fine-tuning GPT-2 with recurrent memory augmentations enables it to handle tasks involving up to $10^7$ elements. This achievement marks a substantial leap, as it is by far the longest input processed by any open neural network model to date, demonstrating a significant improvement in the processing capabilities for long sequences.

arxiv情報

著者	Yuri Kuratov,Aydar Bulatov,Petr Anokhin,Dmitry Sorokin,Artyom Sorokin,Mikhail Burtsev
発行日	2024-02-16 16:15:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー