DiffTORI: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning

要約

このホワイトペーパーでは、Difftoriを紹介します。これは、微分軌跡の最適化を政策表現として利用して、深い補強と模倣学習のためのアクションを生成します。
軌道最適化は、コストとダイナミクス関数によってパラメーター化された、強力で広く使用されているコントロールで使用されています。
私たちのアプローチの鍵は、微分可能な軌道最適化の最近の進捗状況を活用することです。これにより、軌道最適化のパラメーターに関する損失の勾配を計算できます。
その結果、軌道最適化のコストとダイナミクスの関数は、エンドツーエンドを学ぶことができます。
Difftoriは、以前のモデルベースのRLアルゴリズムの「客観的ミスマッチ」問題に対処します。Difftoriのダイナミクスモデルは、軌跡最適化プロセスを通じてポリシー勾配の損失を区別することによりタスクのパフォーマンスを直接最大化することが学習されています。
さらに、高次元感覚観測を備えた標準的なロボット操作タスクスイートの模倣学習のためのDifftoriをさらにベンチマークし、私たちの方法をフィードフォワードポリシークラスとエネルギーベースのモデル（EBM）と拡散と比較します。
15のモデルベースのRLタスクと、高次元画像とポイントクラウド入力を備えた35の模倣学習タスクにわたって、Difftoriは両方のドメインで以前の最先端の方法を上回ります。
私たちのコードは、https：//github.com/wkwan7/difftoriで入手できます。

要約(オリジナル)

This paper introduces DiffTORI, which utilizes Differentiable Trajectory Optimization as the policy representation to generate actions for deep Reinforcement and Imitation learning. Trajectory optimization is a powerful and widely used algorithm in control, parameterized by a cost and a dynamics function. The key to our approach is to leverage the recent progress in differentiable trajectory optimization, which enables computing the gradients of the loss with respect to the parameters of trajectory optimization. As a result, the cost and dynamics functions of trajectory optimization can be learned end-to-end. DiffTORI addresses the “objective mismatch” issue of prior model-based RL algorithms, as the dynamics model in DiffTORI is learned to directly maximize task performance by differentiating the policy gradient loss through the trajectory optimization process. We further benchmark DiffTORI for imitation learning on standard robotic manipulation task suites with high-dimensional sensory observations and compare our method to feed-forward policy classes as well as Energy-Based Models (EBM) and Diffusion. Across 15 model-based RL tasks and 35 imitation learning tasks with high-dimensional image and point cloud inputs, DiffTORI outperforms prior state-of-the-art methods in both domains. Our code is available at https://github.com/wkwan7/DiffTORI.

arxiv情報

著者	Weikang Wan,Ziyu Wang,Yufei Wang,Zackory Erickson,David Held
発行日	2025-06-13 04:41:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DiffTORI: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー