ED2: Environment Dynamics Decomposition World Models for Continuous Control

要約

モデルベースの強化学習 (MBRL) は、実際にはモデルフリー RL と比較して大幅なサンプル効率を達成しますが、そのパフォーマンスはモデル予測誤差の存在によって制限されることがよくあります。
モデルエラーを削減するために、標準的な MBRL アプローチでは、環境ダイナミクス全体に適合するように適切に設計された単一のネットワークをトレーニングしますが、これにより、個別にモデル化できる複数のサブダイナミクスに関する豊富な情報が無駄になり、より正確にワールドモデルを構築できるようになります。
本稿では、環境を分解的にモデル化する新しい世界モデル構築フレームワークであるEnvironment Dynamics Decomposition (ED2)を提案します。
ED2 には、サブダイナミクス検出 (SD2) とダイナミクス分解予測 (D2P) という 2 つの主要なコンポーネントが含まれています。
SD2 は環境内のサブダイナミクスを自動的に検出し、D2P はサブダイナミクスに従って分解された世界モデルを構築します。
ED2 は既存の MBRL アルゴリズムと簡単に組み合わせることができ、さまざまな連続制御タスクで最先端の MBRL アルゴリズムと組み合わせると、ED2 がモデル誤差を大幅に削減し、サンプル効率を向上させ、より高い漸近パフォーマンスを達成することが実験結果によって示されています。
私たちのコードはオープンソースであり、https://github.com/ED2-source-code/ED2 で入手できます。

要約(オリジナル)

Model-based reinforcement learning (MBRL) achieves significant sample efficiency in practice in comparison to model-free RL, but its performance is often limited by the existence of model prediction error. To reduce the model error, standard MBRL approaches train a single well-designed network to fit the entire environment dynamics, but this wastes rich information on multiple sub-dynamics which can be modeled separately, allowing us to construct the world model more accurately. In this paper, we propose the Environment Dynamics Decomposition (ED2), a novel world model construction framework that models the environment in a decomposing manner. ED2 contains two key components: sub-dynamics discovery (SD2) and dynamics decomposition prediction (D2P). SD2 discovers the sub-dynamics in an environment automatically and then D2P constructs the decomposed world model following the sub-dynamics. ED2 can be easily combined with existing MBRL algorithms and empirical results show that ED2 significantly reduces the model error, increases the sample efficiency, and achieves higher asymptotic performance when combined with the state-of-the-art MBRL algorithms on various continuous control tasks. Our code is open source and available at https://github.com/ED2-source-code/ED2.

arxiv情報

著者	Jianye Hao,Yifu Yuan,Cong Wang,Zhen Wang
発行日	2024-02-15 16:05:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ED2: Environment Dynamics Decomposition World Models for Continuous Control

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー