Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models

要約

ロングコンテキスト言語モデル（LCLMS）の最近の進歩は、パイプラインを簡素化することにより、検索された生成（RAG）を変換することを約束します。
拡張されたコンテキストウィンドウを使用すると、LCLMSは知識全体のベースを処理し、検索と推論を直接実行できます。これは、コンテキスト内検索および推論として定義する機能です（ICR^2）。
ただし、ロフトのような既存のベンチマークは、過度に簡素化されたコンテキストを提供することにより、LCLMのパフォーマンスを過大評価することがよくあります。
これに対処するために、強力なレトリバーで取得した交絡パッセージを含めることにより、より現実的なシナリオでLCLMを評価するベンチマークであるICR^2を紹介します。
次に、LCLMのパフォーマンスを向上させるための3つの方法を提案します。（1）テイリーブ – ジュンレートの微調整、（2）注意ヘッドを使用してデコード中に長いコンテキストをフィルタリングおよび非ノイズする長いコンテキスト、および（3）世代のヘッドと沿って共同検索ヘッドトレーニングを提案します。
LoftとICR^2での5つのよく知られたLCLMSの評価は、Loftの正確な一致により、Mistral-7B：+17および+15ポイントに適用される最良のアプローチ、およびICR^2でそれぞれ+13および+2ポイントを使用して、それぞれバニララグと監視された微調整と比較して、有意な利益を示しています。
はるかに小さいモデルであるにもかかわらず、ほとんどのタスクでGPT-4-ターボを上回ることさえあります。

要約(オリジナル)

Recent advancements in long-context language models (LCLMs) promise to transform Retrieval-Augmented Generation (RAG) by simplifying pipelines. With their expanded context windows, LCLMs can process entire knowledge bases and perform retrieval and reasoning directly — a capability we define as In-Context Retrieval and Reasoning (ICR^2). However, existing benchmarks like LOFT often overestimate LCLM performance by providing overly simplified contexts. To address this, we introduce ICR^2, a benchmark that evaluates LCLMs in more realistic scenarios by including confounding passages retrieved with strong retrievers. We then propose three methods to enhance LCLM performance: (1) retrieve-then-generate fine-tuning, (2) retrieval-attention-probing, which uses attention heads to filter and de-noise long contexts during decoding, and (3) joint retrieval head training alongside the generation head. Our evaluation of five well-known LCLMs on LOFT and ICR^2 demonstrates significant gains with our best approach applied to Mistral-7B: +17 and +15 points by Exact Match on LOFT, and +13 and +2 points on ICR^2, compared to vanilla RAG and supervised fine-tuning, respectively. It even outperforms GPT-4-Turbo on most tasks despite being a much smaller model.

arxiv情報

著者	Yifu Qiu,Varun Embar,Yizhe Zhang,Navdeep Jaitly,Shay B. Cohen,Benjamin Han
発行日	2025-02-28 11:40:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Eliciting In-context Retrieval and Reasoning for Long-context Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー