MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL

要約

最近、拡散モデルは、オフライン強化学習 (RL) におけるシーケンスモデリングパラダイムの有望なバックボーンとして注目を集めています。
ただし、これらの作業のほとんどは、報酬やダイナミクスの変化を伴うタスク全体にわたる汎化能力に欠けています。
この課題に取り組むために、本論文では、オフラインメタRL(MetaDiffuser)用のタスク指向条件付き拡散プランナーを提案する。これは、一般化問題を文脈表現を伴う条件付き軌道生成タスクとして考慮する。
重要なのは、さまざまなタスクにわたって計画を立てるためのタスク指向の軌道を生成できる、コンテキスト条件付き拡散モデルを学習することです。
生成された軌道のダイナミクスの一貫性を高めながら、軌道が高いリターンを達成できるようにするために、拡散モデルのサンプリングプロセスでデュアルガイドモジュールをさらに設計します。
提案されたフレームワークは、テストタスクから収集されたウォームスタートデータの品質に対する堅牢性と、さまざまなタスク表現方法と組み込める柔軟性を備えています。
MuJoCo ベンチマークの実験結果は、MetaDiffuser が他の強力なオフラインメタ RL ベースラインよりも優れていることを示し、拡散アーキテクチャの優れた条件付き生成能力を実証しています。

要約(オリジナル)

Recently, diffusion model shines as a promising backbone for the sequence modeling paradigm in offline reinforcement learning(RL). However, these works mostly lack the generalization ability across tasks with reward or dynamics change. To tackle this challenge, in this paper we propose a task-oriented conditioned diffusion planner for offline meta-RL(MetaDiffuser), which considers the generalization problem as conditional trajectory generation task with contextual representation. The key is to learn a context conditioned diffusion model which can generate task-oriented trajectories for planning across diverse tasks. To enhance the dynamics consistency of the generated trajectories while encouraging trajectories to achieve high returns, we further design a dual-guided module in the sampling process of the diffusion model. The proposed framework enjoys the robustness to the quality of collected warm-start data from the testing task and the flexibility to incorporate with different task representation method. The experiment results on MuJoCo benchmarks show that MetaDiffuser outperforms other strong offline meta-RL baselines, demonstrating the outstanding conditional generation ability of diffusion architecture.

arxiv情報

著者	Fei Ni,Jianye Hao,Yao Mu,Yifu Yuan,Yan Zheng,Bin Wang,Zhixuan Liang
発行日	2023-05-31 15:01:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー