Mutual Enhancement of Large Language and Reinforcement Learning Models through Bi-Directional Feedback Mechanisms: A Case Study

要約

大規模言語モデル (LLM) は、計画機能や推論機能など、強化学習 (RL) モデルの優れた機能を実証しています。
ただし、LLM と RL モデルのコラボレーションの問題はまだ解決する必要があります。
この研究では、教師と生徒の学習フレームワークを採用して、特に RL モデルを使用して LLM にフィードバックを提供し、協力的なマルチエージェント設定で LLM を使用して RL モデルに高レベルの情報を提供することで、これらの問題に取り組みます。
このフレームワーク内では、LLM は教師として機能し、RL モデルは生徒として機能します。
2 人のエージェントは、「私はあなたを助けます、私は助けます」などの再帰的なヘルプのプロセスを通じて協力してお互いを支援します。
LLM エージェントは抽象的な情報を RL エージェントに提供し、効率的な探索とポリシーの改善を可能にします。
次に、RL エージェントは LLM エージェントにフィードバックを提供し、より有用なトークンの生成に役立つ貴重なリアルタイム情報を提供します。
この双方向のフィードバックループにより、両方のエージェントの最適化、探索、相互改善が促進され、ますます困難なタスクを達成できるようになります。
注目すべきことに、我々は問題に対処するための実用的なアルゴリズムを提案し、我々の方法の有効性を評価するために実証実験を行っています。

要約(オリジナル)

Large Language Models (LLMs) have demonstrated remarkable capabilities for reinforcement learning (RL) models, such as planning and reasoning capabilities. However, the problems of LLMs and RL model collaboration still need to be solved. In this study, we employ a teacher-student learning framework to tackle these problems, specifically by offering feedback for LLMs using RL models and providing high-level information for RL models with LLMs in a cooperative multi-agent setting. Within this framework, the LLM acts as a teacher, while the RL model acts as a student. The two agents cooperatively assist each other through a process of recursive help, such as ‘I help you help I help.’ The LLM agent supplies abstract information to the RL agent, enabling efficient exploration and policy improvement. In turn, the RL agent offers feedback to the LLM agent, providing valuable, real-time information that helps generate more useful tokens. This bi-directional feedback loop promotes optimization, exploration, and mutual improvement for both agents, enabling them to accomplish increasingly challenging tasks. Remarkably, we propose a practical algorithm to address the problem and conduct empirical experiments to evaluate the effectiveness of our method.

arxiv情報

著者	Shangding Gu
発行日	2024-01-12 14:35:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Mutual Enhancement of Large Language and Reinforcement Learning Models through Bi-Directional Feedback Mechanisms: A Case Study

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー