Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning

要約

プロシージャルに生成された環境上で階層構造を持つアチーブメントを発見することは、大きな課題となります。
このため、エージェントには一般化や長期的な推論など、幅広い能力が求められます。
従来の方法の多くは、長期計画のための明示的なモジュールが階層的な成果を学習するのに有益であると信じて、モデルベースまたは階層的なアプローチに基づいて構築されています。
ただし、これらの方法では、過剰な量の環境インタラクションや大きなモデルサイズが必要となるため、実用性が制限されます。
この研究では、シンプルで汎用性の高いモデルフリーアルゴリズムである近接ポリシー最適化 (PPO) が、最近の実装手法を使用した従来の方法よりも優れたパフォーマンスを発揮することを確認しました。
さらに、PPO エージェントは、信頼度は低いものの、次にロックが解除される実績をある程度予測できることがわかりました。
この観察に基づいて、次の成果を予測するエージェントの能力を強化する、成果蒸留と呼ばれる新しい対比学習方法を提案します。
私たちの方法は、階層的な成果を発見する強力な能力を示し、サンプル効率の高い領域でより少ないモデルパラメーターを使用して、困難な Crafter 環境で最先端のパフォーマンスを示します。

要約(オリジナル)

Discovering achievements with a hierarchical structure on procedurally generated environments poses a significant challenge. This requires agents to possess a broad range of abilities, including generalization and long-term reasoning. Many prior methods are built upon model-based or hierarchical approaches, with the belief that an explicit module for long-term planning would be beneficial for learning hierarchical achievements. However, these methods require an excessive amount of environment interactions or large model sizes, limiting their practicality. In this work, we identify that proximal policy optimization (PPO), a simple and versatile model-free algorithm, outperforms the prior methods with recent implementation practices. Moreover, we find that the PPO agent can predict the next achievement to be unlocked to some extent, though with low confidence. Based on this observation, we propose a novel contrastive learning method, called achievement distillation, that strengthens the agent’s capability to predict the next achievement. Our method exhibits a strong capacity for discovering hierarchical achievements and shows state-of-the-art performance on the challenging Crafter environment using fewer model parameters in a sample-efficient regime.

arxiv情報

著者	Seungyong Moon,Junyoung Yeom,Bumsoo Park,Hyun Oh Song
発行日	2023-07-07 09:47:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー