Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers

要約

大規模な言語モデル（LLM）は、従来の技術を進めるために情報検索に広く統合されています。
ただし、LLMが複雑なタスクで正確な知識を求めることを効果的に可能にすることは、マルチホップクエリの複雑さと無関係な取得コンテンツの複雑さにより、依然として課題のままです。
これらの制限に対処するために、エージェントの検索フレームワークであるExSearchを提案します。ここでは、LLMが自己考えられたプロセスを通じて推論が展開するにつれて有用な情報を取得することを学びます。
各ステップで、LLMは何を取得（思考）するかを決定し、外部レトリバー（検索）をトリガーし、次のステップの推論をサポートするために微調整された証拠（記録）を抽出します。
この機能を使用してLLMを有効にするために、ExSearchは一般化された期待最大化アルゴリズムを採用します。
e-stepでは、LLMは複数の検索軌跡を生成し、それぞれに重要な重量を割り当てます。
M-Stepは、再重み付けされた損失関数でLLMを訓練します。
これにより、LLMが独自の生成されたデータから繰り返し学習し、検索のために徐々に改善する自己考えられたループが作成されます。
さらに、このトレーニングプロセスを理論的に分析し、収束保証を確立します。
4つの知識集約型ベンチマークでの広範な実験は、ExSearchがベースラインを大幅に上回ることを示しています。
これらの有望な結果に動機付けられて、将来の作業を促進するために、方法をより広範なシナリオに拡張する拡張機能であるExsearch-Zooを紹介します。

要約(オリジナル)

Large language models (LLMs) have been widely integrated into information retrieval to advance traditional techniques. However, effectively enabling LLMs to seek accurate knowledge in complex tasks remains a challenge due to the complexity of multi-hop queries as well as the irrelevant retrieved content. To address these limitations, we propose EXSEARCH, an agentic search framework, where the LLM learns to retrieve useful information as the reasoning unfolds through a self-incentivized process. At each step, the LLM decides what to retrieve (thinking), triggers an external retriever (search), and extracts fine-grained evidence (recording) to support next-step reasoning. To enable LLM with this capability, EXSEARCH adopts a Generalized Expectation-Maximization algorithm. In the E-step, the LLM generates multiple search trajectories and assigns an importance weight to each; the M-step trains the LLM on them with a re-weighted loss function. This creates a self-incentivized loop, where the LLM iteratively learns from its own generated data, progressively improving itself for search. We further theoretically analyze this training process, establishing convergence guarantees. Extensive experiments on four knowledge-intensive benchmarks show that EXSEARCH substantially outperforms baselines, e.g., +7.8% improvement on exact match score. Motivated by these promising results, we introduce EXSEARCH-Zoo, an extension that extends our method to broader scenarios, to facilitate future work.

arxiv情報

著者	Zhengliang Shi,Lingyong Yan,Dawei Yin,Suzan Verberne,Maarten de Rijke,Zhaochun Ren
発行日	2025-05-26 15:27:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー