RLingua: Improving Reinforcement Learning Sample Efficiency in Robotic Manipulations With Large Language Models

要約

強化学習 (RL) は、さまざまなタスクを解決する能力を実証していますが、サンプル効率が低いことで有名です。
この論文では、大規模言語モデル (LLM) の内部知識を活用して、ロボット操作における RL のサンプルの複雑さを軽減できるフレームワークである RLingua を提案します。
この目的を達成するために、我々はまず、プロンプトエンジニアリングによってLLMの事前知識を抽出し、特定のタスク用の予備的なルールベースのロボットコントローラをユーザーフレンドリーな方法で生成できるようにする方法を提案します。
不完全ではありますが、LLM で生成されたロボットコントローラーは、ロールアウト中に減衰する確率でアクションサンプルを生成するために利用され、それによって RL のサンプル効率が向上します。
広く使用されている RL ベースライン手法である TD3 を採用し、LLM で生成されたコントローラーに対するポリシー学習を正規化するためにアクター損失を変更します。
RLingua は、RL によって不完全な LLM で生成されたロボットコントローラーを改善する新しい方法も提供します。
RLingua は、panda_gym の 4 つのロボットタスクで TD3 のサンプルの複雑さを大幅に軽減し、標準の TD3 では失敗する、RLBench のサンプリングされた報酬がまばらな 12 のロボットタスクで高い成功率を達成できることを実証します。
さらに、Sim2Real を介して現実世界のロボット実験における RLingua の有効性を検証し、学習したポリシーが実際のロボットタスクに効果的に移行できることを実証しました。
私たちの取り組みの詳細については、プロジェクト Web サイト https://rlingua.github.io でご覧いただけます。

要約(オリジナル)

Reinforcement learning (RL) has demonstrated its capability in solving various tasks but is notorious for its low sample efficiency. In this paper, we propose RLingua, a framework that can leverage the internal knowledge of large language models (LLMs) to reduce the sample complexity of RL in robotic manipulations. To this end, we first present a method for extracting the prior knowledge of LLMs by prompt engineering so that a preliminary rule-based robot controller for a specific task can be generated in a user-friendly manner. Despite being imperfect, the LLM-generated robot controller is utilized to produce action samples during rollouts with a decaying probability, thereby improving RL’s sample efficiency. We employ TD3, the widely-used RL baseline method, and modify the actor loss to regularize the policy learning towards the LLM-generated controller. RLingua also provides a novel method of improving the imperfect LLM-generated robot controllers by RL. We demonstrate that RLingua can significantly reduce the sample complexity of TD3 in four robot tasks of panda_gym and achieve high success rates in 12 sampled sparsely rewarded robot tasks in RLBench, where the standard TD3 fails. Additionally, We validated RLingua’s effectiveness in real-world robot experiments through Sim2Real, demonstrating that the learned policies are effectively transferable to real robot tasks. Further details about our work are available at our project website https://rlingua.github.io.

arxiv情報

著者	Liangliang Chen,Yutian Lei,Shiyu Jin,Ying Zhang,Liangjun Zhang
発行日	2024-03-19 17:52:09+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

RLingua: Improving Reinforcement Learning Sample Efficiency in Robotic Manipulations With Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー