Simplified Temporal Consistency Reinforcement Learning

要約

強化学習は複雑な逐次意思決定タスクを解決できますが、現時点ではサンプルの効率と必要な計算量によって制限されます。
サンプル効率を向上させるために、最近の研究はモデル学習と計画を交互に行うモデルベースの RL に焦点を当てています。
最近の方法では、ポリシー学習、価値推定、および自己教師あり学習が補助目的としてさらに利用されています。
この論文では、驚くべきことに、潜在的な時間的一貫性によって訓練された潜在的なダイナミクスモデルのみに依存する単純な表現学習アプローチが、高性能 RL には十分であることを示します。
これは、表現に条件付けされたダイナミクスモデルで純粋な計画を使用する場合に当てはまりますが、モデルフリー RL でポリシーおよび値関数の特徴として表現を利用する場合にも当てはまります。
実験では、私たちのアプローチは正確なダイナミクスモデルを学習し、アンサンブルベースの方法と比較して 4.1 倍速くトレーニングしながら、オンラインプランナーを使用して困難な高次元の移動タスクを解決します。
特に DeepMind Control Suite Humanoid タスクや Dog タスクなどの高次元タスクにおいて、計画なしのモデルフリー RL を使用すると、私たちのアプローチはモデルフリー手法を大幅に上回り、トレーニング中の 2.4 倍のモデルベース手法のサンプル効率に匹敵します。
もっと早く。

要約(オリジナル)

Reinforcement learning is able to solve complex sequential decision-making tasks but is currently limited by sample efficiency and required computation. To improve sample efficiency, recent work focuses on model-based RL which interleaves model learning with planning. Recent methods further utilize policy learning, value estimation, and, self-supervised learning as auxiliary objectives. In this paper we show that, surprisingly, a simple representation learning approach relying only on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL. This applies when using pure planning with a dynamics model conditioned on the representation, but, also when utilizing the representation as policy and value function features in model-free RL. In experiments, our approach learns an accurate dynamics model to solve challenging high-dimensional locomotion tasks with online planners while being 4.1 times faster to train compared to ensemble-based methods. With model-free RL without planning, especially on high-dimensional tasks, such as the DeepMind Control Suite Humanoid and Dog tasks, our approach outperforms model-free methods by a large margin and matches model-based methods’ sample efficiency while training 2.4 times faster.

arxiv情報

著者	Yi Zhao,Wenshuai Zhao,Rinu Boney,Juho Kannala,Joni Pajarinen
発行日	2023-06-15 19:37:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Simplified Temporal Consistency Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー