ReasonIR: Training Retrievers for Reasoning Tasks

要約

一般的な推論タスクのために特別に訓練された最初のレトリーバーであるReasuir-8Bを提示します。
既存のトレーニングデータセットが簡単に答えるドキュメントに関連する短い事実上のクエリに焦点を当てているため、既存のレトリバーは推論タスクの利益が限られていることを示しています。
各ドキュメントに対して、パイプラインが挑戦的で関連性の高いクエリを作成する合成データ生成パイプラインを開発し、さらにはっきりと関連しているが最終的には役に立たないハードネガティブを作成します。
合成データと既存のパブリックデータの混合物をトレーニングすることにより、Reasuir-8Bは、レランカーなしで29.9 NDCG@10の新しい最新のNDCGと36.9 NDCG@10の明るい、広く使用されている推論集約型情報検索（IR）ベンチマークを実現します。
RAGタスクに適用されると、Reasuir-8Bは、クローズドブックのベースラインと比較して、それぞれMMLUおよびGPQAのパフォーマンスを6.4％と22.6％改善し、他のレトリバーや検索エンジンを上回ります。
さらに、Reasuir-8Bはテスト時間計算をより効果的に使用します。明るくすると、そのパフォーマンスは、より長く、より情報が豊富な書き換えクエリとともに一貫して増加します。
LLMレランカーと組み合わせると、他のレトリバーよりも優れています。
私たちのトレーニングレシピは一般的であり、将来のLLMに簡単に拡張できます。
この目的のために、コード、データ、モデルをオープンソースします。

要約(オリジナル)

We present ReasonIR-8B, the first retriever specifically trained for general reasoning tasks. Existing retrievers have shown limited gains on reasoning tasks, in part because existing training datasets focus on short factual queries tied to documents that straightforwardly answer them. We develop a synthetic data generation pipeline that, for each document, our pipeline creates a challenging and relevant query, along with a plausibly related but ultimately unhelpful hard negative. By training on a mixture of our synthetic data and existing public data, ReasonIR-8B achieves a new state-of-the-art of 29.9 nDCG@10 without reranker and 36.9 nDCG@10 with reranker on BRIGHT, a widely-used reasoning-intensive information retrieval (IR) benchmark. When applied to RAG tasks, ReasonIR-8B improves MMLU and GPQA performance by 6.4% and 22.6% respectively, relative to the closed-book baseline, outperforming other retrievers and search engines. In addition, ReasonIR-8B uses test-time compute more effectively: on BRIGHT, its performance consistently increases with longer and more information-rich rewritten queries; it continues to outperform other retrievers when combined with an LLM reranker. Our training recipe is general and can be easily extended to future LLMs; to this end, we open-source our code, data, and model.

arxiv情報

著者	Rulin Shao,Rui Qiao,Varsha Kishore,Niklas Muennighoff,Xi Victoria Lin,Daniela Rus,Bryan Kian Hsiang Low,Sewon Min,Wen-tau Yih,Pang Wei Koh,Luke Zettlemoyer
発行日	2025-04-29 09:49:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ReasonIR: Training Retrievers for Reasoning Tasks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー