Diffusion Policy Policy Optimization

要約

強化学習 (RL) からのポリシー勾配 (PG) 法を使用した連続制御タスクおよびロボット学習タスクにおける拡散ベースのポリシー (例: 拡散ポリシー) を微調整するためのベストプラクティスを含むアルゴリズムフレームワークである拡散ポリシーポリシー最適化 (DPPO) を紹介します。
PG メソッドは、他のポリシーパラメーター化を使用した RL ポリシーのトレーニングに広く普及しています。
それにもかかわらず、それらは拡散ベースの政策としてはあまり効率的ではないと推測されていた。
驚くべきことに、DPPO は、拡散ベースのポリシーに対する他の RL 手法と比較して、また他のポリシーパラメータ化の PG 微調整と比較して、一般的なベンチマークの微調整において最も強力な全体的なパフォーマンスと効率を達成することを示しました。
実験的調査を通じて、DPPO が RL 微調整と拡散パラメータ化の間の独自の相乗効果を活用し、構造化された多様体上の探索、安定したトレーニング、および強力なポリシーの堅牢性につながることがわかりました。
さらに、ピクセル観察によるロボットタスクのシミュレーションや、長期にわたる多段階操作タスクにおけるロボットハードウェアへのシミュレーションで訓練されたポリシーのゼロショット展開など、さまざまな現実的な設定における DPPO の強みを実証します。
コードを含む Web サイト: diffusion-ppo.github.io

要約(オリジナル)

We introduce Diffusion Policy Policy Optimization, DPPO, an algorithmic framework including best practices for fine-tuning diffusion-based policies (e.g. Diffusion Policy) in continuous control and robot learning tasks using the policy gradient (PG) method from reinforcement learning (RL). PG methods are ubiquitous in training RL policies with other policy parameterizations; nevertheless, they had been conjectured to be less efficient for diffusion-based policies. Surprisingly, we show that DPPO achieves the strongest overall performance and efficiency for fine-tuning in common benchmarks compared to other RL methods for diffusion-based policies and also compared to PG fine-tuning of other policy parameterizations. Through experimental investigation, we find that DPPO takes advantage of unique synergies between RL fine-tuning and the diffusion parameterization, leading to structured and on-manifold exploration, stable training, and strong policy robustness. We further demonstrate the strengths of DPPO in a range of realistic settings, including simulated robotic tasks with pixel observations, and via zero-shot deployment of simulation-trained policies on robot hardware in a long-horizon, multi-stage manipulation task. Website with code: diffusion-ppo.github.io

arxiv情報

著者	Allen Z. Ren,Justin Lidard,Lars L. Ankile,Anthony Simeonov,Pulkit Agrawal,Anirudha Majumdar,Benjamin Burchfiel,Hongkai Dai,Max Simchowitz
発行日	2024-12-09 21:30:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Diffusion Policy Policy Optimization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー