Hierarchical in-Context Reinforcement Learning with Hindsight Modular Reflections for Planning

要約

大規模言語モデル (LLM) は、さまざまな言語タスクにおいて顕著な能力を実証しており、ロボット工学における意思決定の有望な候補となっています。
階層型強化学習 (HRL) に触発されて、私たちは、LLM ベースの高レベルのポリシーを使用して複雑なタスクをサブタスクに分解する新しいフレームワークである階層型インコンテキスト強化学習 (HCRL) を提案します。
オンザフライで高レベルのポリシーによるサブタスク。
目標によって定義されたサブタスクは、完了するために低レベルのポリシーに割り当てられます。
LLM エージェントが目標が完了したと判断すると、新しい目標が提案されます。
マルチエピソード実行におけるエージェントのパフォーマンスを向上させるために、Hindsight Modular Reflection (HMR) を提案します。これは、完全な軌跡を反映する代わりに、タスクの目標を中間目標に置き換え、エージェントに短い軌跡を反映させて反映効率を向上させます。
。
私たちは、ALFWorld、Webshop、HotpotQA という 3 つのベンチマーク環境で、提案されている HCRL の意思決定能力を評価します。
結果は、HCRL が、強力なコンテキスト内学習ベースラインと比較して、5 つのエピソードの実行で 9%、42%、および 10% のパフォーマンス向上を達成できることを示しています。

要約(オリジナル)

Large Language Models (LLMs) have demonstrated remarkable abilities in various language tasks, making them promising candidates for decision-making in robotics. Inspired by Hierarchical Reinforcement Learning (HRL), we propose Hierarchical in-Context Reinforcement Learning (HCRL), a novel framework that decomposes complex tasks into sub-tasks using an LLM-based high-level policy, in which a complex task is decomposed into sub-tasks by a high-level policy on-the-fly. The sub-tasks, defined by goals, are assigned to the low-level policy to complete. Once the LLM agent determines that the goal is finished, a new goal will be proposed. To improve the agent’s performance in multi-episode execution, we propose Hindsight Modular Reflection (HMR), where, instead of reflecting on the full trajectory, we replace the task objective with intermediate goals and let the agent reflect on shorter trajectories to improve reflection efficiency. We evaluate the decision-making ability of the proposed HCRL in three benchmark environments–ALFWorld, Webshop, and HotpotQA. Results show that HCRL can achieve 9%, 42%, and 10% performance improvement in 5 episodes of execution over strong in-context learning baselines.

arxiv情報

著者	Chuanneng Sun,Songjun Huang,Dario Pompili
発行日	2024-08-12 22:40:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Hierarchical in-Context Reinforcement Learning with Hindsight Modular Reflections for Planning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー