DHP: Discrete Hierarchical Planning for Hierarchical Reinforcement Learning Agents

要約

本論文では、階層的強化学習（HRL）を用いて、長ホライズン視覚計画タスクの課題に取り組む。我々の主要な貢献は、従来の距離ベースのアプローチに代わる離散階層的プランニング（Discrete Hierarchical Planning: DHP）手法である。我々はこの手法の理論的基礎を提供し、広範な経験的評価を通じてその有効性を実証する。我々のエージェントは、長期的なゴールの文脈でサブゴールを再帰的に予測し、抽象的な行動の組み合わせとして計画を構成することで離散的な報酬を受け取る。本手法は木の軌跡に対する新しい利点推定戦略を導入し、本質的に短い計画を奨励し、最大木の深さを超える汎化を可能にする。学習された政策関数により、エージェントは効率的に計画を立てることができ、わずか$log N$計算ステップしか必要とせず、再計画が非常に効率的になる。このエージェントは、ソフト・アクター・クリティック(SAC)フレームワークに基づき、オンポリシーの想像データを用いて学習される。さらに、エージェントが計画モジュールに関連する訓練例を生成することを可能にする、新しい探索戦略を提案する。本手法を、25部屋の環境における、長ホライズン視覚計画タスクで評価したところ、成功率と平均エピソード長において、従来のベンチマークを大きく上回った。さらに、アブレーション研究により、全体的な性能に対する主要なモジュールの個々の寄与を明らかにする。

要約(オリジナル)

In this paper, we address the challenge of long-horizon visual planning tasks using Hierarchical Reinforcement Learning (HRL). Our key contribution is a Discrete Hierarchical Planning (DHP) method, an alternative to traditional distance-based approaches. We provide theoretical foundations for the method and demonstrate its effectiveness through extensive empirical evaluations. Our agent recursively predicts subgoals in the context of a long-term goal and receives discrete rewards for constructing plans as compositions of abstract actions. The method introduces a novel advantage estimation strategy for tree trajectories, which inherently encourages shorter plans and enables generalization beyond the maximum tree depth. The learned policy function allows the agent to plan efficiently, requiring only $\log N$ computational steps, making re-planning highly efficient. The agent, based on a soft-actor critic (SAC) framework, is trained using on-policy imagination data. Additionally, we propose a novel exploration strategy that enables the agent to generate relevant training examples for the planning modules. We evaluate our method on long-horizon visual planning tasks in a 25-room environment, where it significantly outperforms previous benchmarks at success rate and average episode length. Furthermore, an ablation study highlights the individual contributions of key modules to the overall performance.

arxiv情報

著者	Shashank Sharma,Janina Hoffmann,Vinay Namboodiri
発行日	2025-02-04 03:05:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

DHP: Discrete Hierarchical Planning for Hierarchical Reinforcement Learning Agents

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー