AdaFlow: Imitation Learning with Variance-Adaptive Flow-Based Policies

要約

拡散ベースの模倣学習は、マルチモーダルな意思決定における行動クローニング (BC) を改善しますが、拡散プロセスの再帰により推論が大幅に遅くなるという代償を伴います。
これにより、多様なアクションを生成する機能を維持しながら、効率的なポリシージェネレーターを設計することが求められます。
この課題に対処するために、フローベースの生成モデリングに基づく模倣学習フレームワークである AdaFlow を提案します。
AdaFlow は、確率フローとして知られる状態条件付き常微分方程式 (ODE) を使用してポリシーを表します。
訓練損失の条件付き分散と ODE の離散化誤差の間の興味深い関係を明らかにします。
この洞察を基に、推論段階でステップサイズを調整できる分散適応型 ODE ソルバーを提案します。これにより、AdaFlow が適応型意思決定者となり、多様性を犠牲にすることなく迅速な推論が可能になります。
興味深いことに、アクションの分布が単峰性の場合、自動的に 1 ステップのジェネレーターに縮小されます。
私たちの包括的な実証評価により、AdaFlow が高速な推論速度で高いパフォーマンスを達成していることが示されています。

要約(オリジナル)

Diffusion-based imitation learning improves Behavioral Cloning (BC) on multi-modal decision-making, but comes at the cost of significantly slower inference due to the recursion in the diffusion process. It urges us to design efficient policy generators while keeping the ability to generate diverse actions. To address this challenge, we propose AdaFlow, an imitation learning framework based on flow-based generative modeling. AdaFlow represents the policy with state-conditioned ordinary differential equations (ODEs), which are known as probability flows. We reveal an intriguing connection between the conditional variance of their training loss and the discretization error of the ODEs. With this insight, we propose a variance-adaptive ODE solver that can adjust its step size in the inference stage, making AdaFlow an adaptive decision-maker, offering rapid inference without sacrificing diversity. Interestingly, it automatically reduces to a one-step generator when the action distribution is uni-modal. Our comprehensive empirical evaluation shows that AdaFlow achieves high performance with fast inference speed.

arxiv情報

著者	Xixi Hu,Bo Liu,Xingchao Liu,Qiang Liu
発行日	2024-11-22 18:11:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AdaFlow: Imitation Learning with Variance-Adaptive Flow-Based Policies

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー