Effective and Transparent RAG: Adaptive-Reward Reinforcement Learning for Decision Traceability

要約

検索された生成（RAG）は、知識集約型ドメインでの大規模な言語モデル（LLM）のパフォーマンスを大幅に改善しました。
ただし、RAGは異なるドメインで成功を達成しましたが、まだ解決されていない課題がいくつかあります。1）有効性。
既存の研究は、主により強力なぼろレトリーバーの開発に焦点を当てていますが、推論と生成のために検索された情報を利用する発電機（LLM）能力を強化する方法は？
2）透明性。
ほとんどのRAGメソッドは、取得したコンテンツが実際に推論プロセスに寄与するかを無視し、解釈可能性と視認性の欠如をもたらします。
これに対処するために、提案された報酬とともに強化学習（RL）を介してトレーニングされた透明なRAGジェネレーターフレームワークであるArena（Adaptive-Rewarded Evidence Navigation Agent）を提案します。
構造化された生成と適応報酬計算に基づいて、当社のRLベースのトレーニングにより、モデルは重要な証拠を特定し、構造化された推論を実行し、解釈可能な決定トレースで回答を生成できます。
QWEN2.5-7B-Instructおよびllama3.1-8B-instructに適用されると、さまざまなRAGベースラインを用いた豊富な実験は、モデルがすべてのマルチホップQAデータセットで10〜30％の改善を達成することを示しています。
さらなる分析により、アリーナは、追加のトレーニングなしで新しいデータセットで採用される柔軟性が強いことが示されています。
モデルとコードは公開されています。

要約(オリジナル)

Retrieval-Augmented Generation (RAG) has significantly improved the performance of large language models (LLMs) on knowledge-intensive domains. However, although RAG achieved successes across distinct domains, there are still some unsolved challenges: 1) Effectiveness. Existing research mainly focuses on developing more powerful RAG retrievers, but how to enhance the generator’s (LLM’s) ability to utilize the retrieved information for reasoning and generation? 2) Transparency. Most RAG methods ignore which retrieved content actually contributes to the reasoning process, resulting in a lack of interpretability and visibility. To address this, we propose ARENA (Adaptive-Rewarded Evidence Navigation Agent), a transparent RAG generator framework trained via reinforcement learning (RL) with our proposed rewards. Based on the structured generation and adaptive reward calculation, our RL-based training enables the model to identify key evidence, perform structured reasoning, and generate answers with interpretable decision traces. Applied to Qwen2.5-7B-Instruct and Llama3.1-8B-Instruct, abundant experiments with various RAG baselines demonstrate that our model achieves 10-30% improvements on all multi-hop QA datasets, which is comparable with the SOTA Commercially-developed LLMs (e.g., OpenAI-o1, DeepSeek-R1). Further analyses show that ARENA has strong flexibility to be adopted on new datasets without extra training. Our models and codes are publicly released.

arxiv情報

著者	Jingyi Ren,Yekun Xu,Xiaolong Wang,Weitao Li,Weizhi Ma,Yang Liu
発行日	2025-05-19 15:40:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Effective and Transparent RAG: Adaptive-Reward Reinforcement Learning for Decision Traceability

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー