Off-dynamics Conditional Diffusion Planners

要約

オフライン強化学習 (RL) は、既存のデータセットを活用することで、インタラクティブなデータ取得に代わる魅力的な代替手段を提供します。
ただし、その有効性はデータサンプルの量と質に依存します。
この研究では、オフライン RL におけるデータ不足の課題に対処するために、オフダイナミクスデータセットではあるものの、より容易に利用できるデータセットの使用を検討しています。
我々は、条件付き拡散確率モデル (DPM) を使用して、大規模なオフダイナミクスデータセットと限定されたターゲットデータセットの同時分布を学習する新しいアプローチを提案します。
モデルが基礎となるダイナミクス構造をキャプチャできるようにするために、条件付きモデルに 2 つのコンテキストを導入します。(1) 連続ダイナミクススコアにより、両方のデータセットからの軌跡間の部分的な重複が可能になり、モデルにより豊富な情報が提供されます。
(2) 逆ダイナミクスコンテキストは、ターゲット環境の動的制約に従う軌道を生成するようにモデルをガイドします。
経験的な結果は、私たちの方法がいくつかの強力なベースラインを大幅に上回ることを示しています。
アブレーション研究では、各力学コンテキストの重要な役割がさらに明らかになります。
さらに、私たちのモデルは、コンテキストを変更することで、ソースとターゲットのダイナミクスの間を補間し、環境の微妙な変化に対してより堅牢になることを示しています。

要約(オリジナル)

Offline Reinforcement Learning (RL) offers an attractive alternative to interactive data acquisition by leveraging pre-existing datasets. However, its effectiveness hinges on the quantity and quality of the data samples. This work explores the use of more readily available, albeit off-dynamics datasets, to address the challenge of data scarcity in Offline RL. We propose a novel approach using conditional Diffusion Probabilistic Models (DPMs) to learn the joint distribution of the large-scale off-dynamics dataset and the limited target dataset. To enable the model to capture the underlying dynamics structure, we introduce two contexts for the conditional model: (1) a continuous dynamics score allows for partial overlap between trajectories from both datasets, providing the model with richer information; (2) an inverse-dynamics context guides the model to generate trajectories that adhere to the target environment’s dynamic constraints. Empirical results demonstrate that our method significantly outperforms several strong baselines. Ablation studies further reveal the critical role of each dynamics context. Additionally, our model demonstrates that by modifying the context, we can interpolate between source and target dynamics, making it more robust to subtle shifts in the environment.

arxiv情報

著者	Wen Zheng Terence Ng,Jianda Chen,Tianwei Zhang
発行日	2024-10-16 04:56:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Off-dynamics Conditional Diffusion Planners

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー