Goal Exploration Augmentation via Pre-trained Skills for Sparse-Reward Long-Horizon Goal-Conditioned Reinforcement Learning

要約

強化学習 (RL) は、複雑な環境で、報酬がまばらで長期的なタスクを達成するのに苦労することがよくあります。
目標条件付き強化学習 (GCRL) は、達成しやすいサブ目標のカリキュラムを通じてこの困難な問題に取り組むために採用されています。
GCRL では、エージェントが最終的に望ましい目標への道筋を見つけるためには、新しいサブ目標を探索することが不可欠です。
新しいサブ目標を効率的に探索する方法は、GCRL で最も困難な問題の 1 つです。
この問題に対処するためにいくつかの目標探索方法が提案されていますが、望ましい目標を効率的に見つけるのはまだ困難です。
この論文では、サブ目標選択ベースの GCRL におけるより効率的な目標探索のために、達成された目標と探索される新しい目標の両方のエントロピーを最適化することにより、新しい学習目標を提案します。
この目標を最適化するために、まず現在のタスクと同様の環境で頻繁に発生する目標移行パターンを探索して活用し、スキル学習を通じてスキルを構築します。
次に、事前トレーニングされたスキルが目標の探索に適用されます。
さまざまな予備報酬の長期ベンチマークタスクの評価では、いくつかの最先端の GCRL ベースラインに私たちの手法を組み込むことで、パフォーマンスを向上または維持しながら探査効率が大幅に向上することが示唆されています。
ソースコードは https://github.com/GEAPS/GEAPS から入手できます。

要約(オリジナル)

Reinforcement learning (RL) often struggles to accomplish a sparse-reward long-horizon task in a complex environment. Goal-conditioned reinforcement learning (GCRL) has been employed to tackle this difficult problem via a curriculum of easy-to-reach sub-goals. In GCRL, exploring novel sub-goals is essential for the agent to ultimately find the pathway to the desired goal. How to explore novel sub-goals efficiently is one of the most challenging issues in GCRL. Several goal exploration methods have been proposed to address this issue but still struggle to find the desired goals efficiently. In this paper, we propose a novel learning objective by optimizing the entropy of both achieved and new goals to be explored for more efficient goal exploration in sub-goal selection based GCRL. To optimize this objective, we first explore and exploit the frequently occurring goal-transition patterns mined in the environments similar to the current task to compose skills via skill learning. Then, the pretrained skills are applied in goal exploration. Evaluation on a variety of spare-reward long-horizon benchmark tasks suggests that incorporating our method into several state-of-the-art GCRL baselines significantly boosts their exploration efficiency while improving or maintaining their performance. The source code is available at: https://github.com/GEAPS/GEAPS.

arxiv情報

著者	Lisheng Wu,Ke Chen
発行日	2023-12-19 11:00:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Goal Exploration Augmentation via Pre-trained Skills for Sparse-Reward Long-Horizon Goal-Conditioned Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー