Writing in the Margins: Better Inference Pattern for Long Context Retrieval

要約

この論文では、検索指向のタスクにおける長い入力シーケンスの処理を最適化するために設計された大規模言語モデルの新しい推論パターンである Writing in the Margins (WiM) を紹介します。
このアプローチでは、キーと値のキャッシュのチャンク化されたプリフィルを利用してセグメントごとの推論を実行します。これにより、モデルを特定のタスクに導く中間情報 (「マージン」) の生成と分類とともに、広範なコンテキストの効率的な処理が可能になります。
この方法では、計算オーバーヘッドがわずかに増加しますが、微調整を必要とせずに既製モデルのパフォーマンスが大幅に向上します。
具体的には、WiM により推論スキル (HotpotQA、MultiHop-RAG) の精度が平均 7.5% 向上し、集計タスク (CWE) の F1 スコアが 30.0% 以上向上したことが観察されています。
さらに、提案されたパターンが、コンテキスト処理の進行状況に関する継続的な更新をエンドユーザーに提供し、関連情報の最終応答への統合を正確に示す対話型検索設計にどのように適合するかを示します。
Hugging Face Transformers ライブラリを使用した WiM の実装を https://github.com/writer/writing-in-the-margins でリリースします。

要約(オリジナル)

In this paper, we introduce Writing in the Margins (WiM), a new inference pattern for Large Language Models designed to optimize the handling of long input sequences in retrieval-oriented tasks. This approach leverages the chunked prefill of the key-value cache to perform segment-wise inference, which enables efficient processing of extensive contexts along with the generation and classification of intermediate information (‘margins’) that guide the model towards specific tasks. This method increases computational overhead marginally while significantly enhancing the performance of off-the-shelf models without the need for fine-tuning. Specifically, we observe that WiM provides an average enhancement of 7.5% in accuracy for reasoning skills (HotpotQA, MultiHop-RAG) and more than a 30.0% increase in the F1-score for aggregation tasks (CWE). Additionally, we show how the proposed pattern fits into an interactive retrieval design that provides end-users with ongoing updates about the progress of context processing, and pinpoints the integration of relevant information into the final response. We release our implementation of WiM using Hugging Face Transformers library at https://github.com/writer/writing-in-the-margins.

arxiv情報

著者	Melisa Russak,Umar Jamil,Christopher Bryant,Kiran Kamble,Axel Magnuson,Mateusz Russak,Waseem AlShikh
発行日	2024-08-27 09:34:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Writing in the Margins: Better Inference Pattern for Long Context Retrieval

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー