Augmenting Language Models with Long-Term Memory

要約

既存の大規模言語モデル (LLM) では、入力長の制限により固定サイズの入力しか対応できず、過去の入力からの豊富な長いコンテキスト情報を利用できません。
これに対処するために、LLM が長い歴史を記憶できるようにする、長期記憶で拡張された言語モデル (LongMem) というフレームワークを提案します。
私たちは、メモリエンコーダとしてフリーズされたオリジナルのバックボーン LLM と、メモリリトリーバおよびリーダーとしての適応型残差サイドネットワークを備えた、新しい分離されたネットワークアーキテクチャを設計します。
このような分離メモリ設計では、メモリの古さに悩まされることなく、メモリ検索のために長期にわたる過去のコンテキストを簡単にキャッシュおよび更新できます。
記憶増強適応トレーニングによって強化された LongMem は、過去のコンテキストを記憶し、言語モデリングに長期記憶を使用できます。
提案されたメモリ取得モジュールは、メモリバンク内の無制限の長さのコンテキストを処理でき、さまざまなダウンストリームタスクに利益をもたらします。
通常、LongMem はロングフォームメモリを 65,000 トークンまで拡張できるため、コンテキスト内学習用に、マルチショットの追加のデモンストレーションサンプルをロングフォームメモリとしてキャッシュできます。
実験の結果、私たちの手法は、挑戦的なロングコンテキストモデリングベンチマークである ChapterBreak で強力なロングコンテキストモデルを上回っており、メモリ拡張インコンテキスト学習において LLM よりも顕著な改善を達成していることが示されています。
この結果は、提案手法が言語モデルによる長文コンテンツの記憶と利用を支援するのに効果的であることを示しています。
私たちのコードは https://aka.ms/LongMem でオープンソース化されています。

要約(オリジナル)

Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history. We design a novel decoupled network architecture with the original backbone LLM frozen as a memory encoder and an adaptive residual side-network as a memory retriever and reader. Such a decoupled memory design can easily cache and update long-term past contexts for memory retrieval without suffering from memory staleness. Enhanced with memory-augmented adaptation training, LongMem can thus memorize long past context and use long-term memory for language modeling. The proposed memory retrieval module can handle unlimited-length context in its memory bank to benefit various downstream tasks. Typically, LongMem can enlarge the long-form memory to 65k tokens and thus cache many-shot extra demonstration examples as long-form memory for in-context learning. Experiments show that our method outperforms strong long-context models on ChapterBreak, a challenging long-context modeling benchmark, and achieves remarkable improvements on memory-augmented in-context learning over LLMs. The results demonstrate that the proposed method is effective in helping language models to memorize and utilize long-form contents. Our code is open-sourced at https://aka.ms/LongMem.

arxiv情報

著者	Weizhi Wang,Li Dong,Hao Cheng,Xiaodong Liu,Xifeng Yan,Jianfeng Gao,Furu Wei
発行日	2023-06-12 15:13:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Augmenting Language Models with Long-Term Memory

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー