Distilling Conditional Diffusion Models for Offline Reinforcement Learning through Trajectory Stitching

要約

深層生成モデルは、オフライン強化学習への効果的なアプローチとして最近登場しました。
ただし、モデルサイズが大きいため、計算に課題が生じます。
私たちは、データ拡張に基づく知識蒸留方法を提案することで、この問題に対処します。
特に、ハイリターンの軌道は条件付き拡散モデルから生成され、新しい報酬ジェネレーターを活用する新しいステッチングアルゴリズムを通じて元の軌道とブレンドされます。
結果のデータセットを動作クローン作成に適用すると、サイズがはるかに小さい学習された浅いポリシーは、いくつかの D4RL ベンチマークで深い生成プランナーよりも優れたパフォーマンスを発揮するか、ほぼ一致します。

要約(オリジナル)

Deep generative models have recently emerged as an effective approach to offline reinforcement learning. However, their large model size poses challenges in computation. We address this issue by proposing a knowledge distillation method based on data augmentation. In particular, high-return trajectories are generated from a conditional diffusion model, and they are blended with the original trajectories through a novel stitching algorithm that leverages a new reward generator. Applying the resulting dataset to behavioral cloning, the learned shallow policy whose size is much smaller outperforms or nearly matches deep generative planners on several D4RL benchmarks.

arxiv情報

著者	Shangzhe Li,Xinhua Zhang
発行日	2024-02-01 17:44:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Distilling Conditional Diffusion Models for Offline Reinforcement Learning through Trajectory Stitching

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー