Conditioning Matters: Training Diffusion Policies is Faster Than You Think

要約

拡散ポリシーは、ビジョン言語アクション（VLA）モデルを構築するための主流のパラダイムとして浮上しています。
彼らは強力なロボット制御能力を示していますが、トレーニング効率は最適ではありません。
この作業では、条件付き拡散ポリシートレーニングにおける基本的な課題を特定します。生成条件を区別が困難な場合、トレーニングの目的は、崩壊と呼ばれる現象である限界行動分布のモデリングに退化します。
これを克服するために、CoCosを提案します。これは、条件付きフローの一致のソース分布を条件依存性に変更するシンプルでありながら一般的なソリューションです。
条件入力から抽出されたセマンティクスの周りにソース分布を固定することにより、CoCosは条件の統合が強くなり、損失の崩壊を防ぎます。
シミュレーションと現実世界のベンチマーク全体にわたって、理論的正当化と広範な経験的結果を提供します。
私たちの方法は、既存のアプローチよりも速い収束と成功率が高くなり、大規模な事前訓練を受けたVLAのパフォーマンスと一致し、勾配ステップとパラメーターが大幅に少なくなります。
COCOSは軽量で、実装が簡単で、多様なポリシーアーキテクチャと互換性があり、拡散ポリシートレーニングに汎用改善を提供します。

要約(オリジナル)

Diffusion policies have emerged as a mainstream paradigm for building vision-language-action (VLA) models. Although they demonstrate strong robot control capabilities, their training efficiency remains suboptimal. In this work, we identify a fundamental challenge in conditional diffusion policy training: when generative conditions are hard to distinguish, the training objective degenerates into modeling the marginal action distribution, a phenomenon we term loss collapse. To overcome this, we propose Cocos, a simple yet general solution that modifies the source distribution in the conditional flow matching to be condition-dependent. By anchoring the source distribution around semantics extracted from condition inputs, Cocos encourages stronger condition integration and prevents the loss collapse. We provide theoretical justification and extensive empirical results across simulation and real-world benchmarks. Our method achieves faster convergence and higher success rates than existing approaches, matching the performance of large-scale pre-trained VLAs using significantly fewer gradient steps and parameters. Cocos is lightweight, easy to implement, and compatible with diverse policy architectures, offering a general-purpose improvement to diffusion policy training.

arxiv情報

著者	Zibin Dong,Yicheng Liu,Yinchuan Li,Hang Zhao,Jianye Hao
発行日	2025-05-16 11:14:22+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Conditioning Matters: Training Diffusion Policies is Faster Than You Think

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー