R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

要約

大規模な言語モデル（LLM）は強力ですが、静的な知識のために幻覚を起こしやすいです。
検索された生成（RAG）は、外部情報を注入することで役立ちますが、現在の方法は、多くの場合、コストがかかるか、一般化されていない、またはモデルの内部知識を無視します。
この論文では、LLMSをトレーニングするように設計された新しいフレームワークであるR1-Searcher ++を紹介します。
R1-Searcher ++は、2段階のトレーニング戦略を採用しています。予備形式学習のための初期SFTコールドスタートフェーズ、次に動的知識習得のためにRLが続きます。
RL段階では、アウトカムスーパービジョンを使用して探索を促進し、内部知識の利用に対する報酬メカニズムを組み込み、記憶型メカニズムを統合して検索された情報を継続的に同化させ、モデルの内部知識を豊かにします。
内部の知識と外部検索エンジンを活用することにより、モデルはその機能を継続的に改善し、効率的な検索された推論を可能にします。
私たちの実験は、R1-Searcher ++が以前のRAGおよび推論方法を上回り、効率的な検索を達成することを示しています。
このコードは、https：//github.com/rucaibox/r1-searcher-plusで入手できます。

要約(オリジナル)

Large Language Models (LLMs) are powerful but prone to hallucinations due to static knowledge. Retrieval-Augmented Generation (RAG) helps by injecting external information, but current methods often are costly, generalize poorly, or ignore the internal knowledge of the model. In this paper, we introduce R1-Searcher++, a novel framework designed to train LLMs to adaptively leverage both internal and external knowledge sources. R1-Searcher++ employs a two-stage training strategy: an initial SFT Cold-start phase for preliminary format learning, followed by RL for Dynamic Knowledge Acquisition. The RL stage uses outcome-supervision to encourage exploration, incorporates a reward mechanism for internal knowledge utilization, and integrates a memorization mechanism to continuously assimilate retrieved information, thereby enriching the model’s internal knowledge. By leveraging internal knowledge and external search engine, the model continuously improves its capabilities, enabling efficient retrieval-augmented reasoning. Our experiments demonstrate that R1-Searcher++ outperforms previous RAG and reasoning methods and achieves efficient retrieval. The code is available at https://github.com/RUCAIBox/R1-Searcher-plus.

arxiv情報

著者	Huatong Song,Jinhao Jiang,Wenqing Tian,Zhipeng Chen,Yuhuan Wu,Jiahao Zhao,Yingqian Min,Wayne Xin Zhao,Lei Fang,Ji-Rong Wen
発行日	2025-05-22 17:58:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー