Learning Successor Features the Simple Way

要約

深層強化学習 (RL) では、非定常環境において壊滅的な忘却や干渉を示さない表現を学習するのは困難です。
後継機能 (SF) は、この課題に対する潜在的な解決策を提供します。
ただし、ピクセルレベルの観察から SF を学習するための標準的な手法は、表現の崩壊につながることが多く、表現が縮退してデータ内の意味のある変動を捕捉できなくなります。
SF を学習するための最近の方法では、表現の崩壊を回避できますが、多くの場合、複雑な損失と複数の学習フェーズが発生し、効率が低下します。
ピクセルから直接 SF を学習するための、新しくて簡単な方法を紹介します。
私たちのアプローチでは、時間差 (TD) 損失と報酬予測損失の組み合わせを使用し、SF の基本的な数学的定義を一緒に捉えます。
私たちのアプローチは、単一学習シナリオと継続学習シナリオの両方で、2D (ミニグリッド)、3D (ミニワールド) 迷路、および Mujoco の両方で既存の SF 学習技術と同等またはそれを上回るパフォーマンスを示すことを示します。
また、私たちの手法は効率的であり、他のアプローチよりも短い時間でより高いレベルのパフォーマンスに到達できます。
私たちの研究は、事前トレーニングを必要とせずに、ピクセル観察から直接 SF を学習するための新しい合理化された手法を提供します。

要約(オリジナル)

In Deep Reinforcement Learning (RL), it is a challenge to learn representations that do not exhibit catastrophic forgetting or interference in non-stationary environments. Successor Features (SFs) offer a potential solution to this challenge. However, canonical techniques for learning SFs from pixel-level observations often lead to representation collapse, wherein representations degenerate and fail to capture meaningful variations in the data. More recent methods for learning SFs can avoid representation collapse, but they often involve complex losses and multiple learning phases, reducing their efficiency. We introduce a novel, simple method for learning SFs directly from pixels. Our approach uses a combination of a Temporal-difference (TD) loss and a reward prediction loss, which together capture the basic mathematical definition of SFs. We show that our approach matches or outperforms existing SF learning techniques in both 2D (Minigrid), 3D (Miniworld) mazes and Mujoco, for both single and continual learning scenarios. As well, our technique is efficient, and can reach higher levels of performance in less time than other approaches. Our work provides a new, streamlined technique for learning SFs directly from pixel observations, with no pretraining required.

arxiv情報

著者	Raymond Chua,Arna Ghosh,Christos Kaplanis,Blake A. Richards,Doina Precup
発行日	2024-10-29 15:31:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning Successor Features the Simple Way

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー