RLingua: Improving Reinforcement Learning Sample Efficiency in Robotic Manipulations With Large Language Models

要約

強化学習 (RL) は、さまざまなタスクを解決する能力を実証していますが、サンプル効率が低いことで有名です。
この論文では、大規模言語モデル (LLM) の内部知識を活用して、ロボット操作における RL のサンプルの複雑さを軽減できるフレームワークである RLingua を提案します。
この目的を達成するために、まず、特定のタスク用の予備的なルールベースのロボットコントローラーを生成できるように、プロンプトエンジニアリングによって LLM の事前知識を抽出する方法を紹介します。
不完全ではありますが、LLM で生成されたロボットコントローラーは、ロールアウト中に減衰する確率でアクションサンプルを生成するために利用され、それによって RL のサンプル効率が向上します。
アクタークリティックフレームワークを採用し、アクター損失を修正して、LLM で生成されたコントローラーに対するポリシー学習を正規化します。
RLingua は、RL によって不完全な LLM で生成されたロボットコントローラーを改善する新しい方法も提供します。
私たちは、RLingua が panda_gym のロボットタスクにおける TD3 のサンプルの複雑さを大幅に軽減し、標準の TD3 では失敗する RLBench の報酬がまばらなロボットタスクで高い成功率を達成できることを実証しました。
さらに、Sim2Real を介して現実世界のロボット実験における RLingua の有効性を検証し、学習したポリシーが実際のロボットタスクに効果的に移行できることを実証しました。
私たちの取り組みに関する詳細とビデオは、プロジェクト Web サイト https://rlingua.github.io でご覧いただけます。

要約(オリジナル)

Reinforcement learning (RL) has demonstrated its capability in solving various tasks but is notorious for its low sample efficiency. In this paper, we propose RLingua, a framework that can leverage the internal knowledge of large language models (LLMs) to reduce the sample complexity of RL in robotic manipulations. To this end, we first present how to extract the prior knowledge of LLMs by prompt engineering so that a preliminary rule-based robot controller for a specific task can be generated. Despite being imperfect, the LLM-generated robot controller is utilized to produce action samples during rollouts with a decaying probability, thereby improving RL’s sample efficiency. We employ the actor-critic framework and modify the actor loss to regularize the policy learning towards the LLM-generated controller. RLingua also provides a novel method of improving the imperfect LLM-generated robot controllers by RL. We demonstrated that RLingua can significantly reduce the sample complexity of TD3 in the robot tasks of panda_gym and achieve high success rates in sparsely rewarded robot tasks in RLBench, where the standard TD3 fails. Additionally, We validated RLingua’s effectiveness in real-world robot experiments through Sim2Real, demonstrating that the learned policies are effectively transferable to real robot tasks. Further details and videos about our work are available at our project website https://rlingua.github.io.

arxiv情報

著者	Liangliang Chen,Yutian Lei,Shiyu Jin,Ying Zhang,Liangjun Zhang
発行日	2024-03-11 04:13:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

RLingua: Improving Reinforcement Learning Sample Efficiency in Robotic Manipulations With Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー