Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation

要約

モバイルマニピュレーターやクアッドローターなどの多くのロボットシステムには、スペース、重量、電力の制約のため、ハイエンド GPU を装備することができません。
これらの制約により、これらのシステムは、高速ポリシー推論を実現するためにハイエンド GPU を必要とする visumotor ポリシーアーキテクチャの最近の開発を活用することができません。
この論文では、視覚運動ロボット制御を学習するための拡散ポリシーに代わる、より高速で同様に強力な代替手段である一貫性ポリシーを提案します。
一貫性ポリシーは推論速度が速いため、リソースに制約のあるロボット設定での低遅延の意思決定を可能にします。
一貫性ポリシーは、拡散ポリシーの学習された軌道に沿って自己一貫性を強制することによって、事前にトレーニングされた拡散ポリシーから抽出されます。
6 つのシミュレーションタスクとラップトップ GPU で推論を実証する 3 つの現実世界のタスクにわたって、一貫性ポリシーと拡散ポリシー、およびその他の関連する高速化手法を比較します。
これらすべてのタスクについて、整合性ポリシーは、最速の代替方法と比較して推論を 1 桁高速化し、競争力のある成功率を維持します。
また、整合性ポリシーのトレーニング手順が事前トレーニング済みの拡散ポリシーの品質に対して堅牢であることも示します。この結果は、実践者が事前トレーニング済みモデルの広範なテストを回避するのに役立ちます。
このパフォーマンスを可能にする主な設計上の決定は、一貫性目標の選択、初期サンプルの分散の低減、およびプリセットチェーンステップの選択です。

要約(オリジナル)

Many robotic systems, such as mobile manipulators or quadrotors, cannot be equipped with high-end GPUs due to space, weight, and power constraints. These constraints prevent these systems from leveraging recent developments in visuomotor policy architectures that require high-end GPUs to achieve fast policy inference. In this paper, we propose Consistency Policy, a faster and similarly powerful alternative to Diffusion Policy for learning visuomotor robot control. By virtue of its fast inference speed, Consistency Policy can enable low latency decision making in resource-constrained robotic setups. A Consistency Policy is distilled from a pretrained Diffusion Policy by enforcing self-consistency along the Diffusion Policy’s learned trajectories. We compare Consistency Policy with Diffusion Policy and other related speed-up methods across 6 simulation tasks as well as three real-world tasks where we demonstrate inference on a laptop GPU. For all these tasks, Consistency Policy speeds up inference by an order of magnitude compared to the fastest alternative method and maintains competitive success rates. We also show that the Conistency Policy training procedure is robust to the pretrained Diffusion Policy’s quality, a useful result that helps practioners avoid extensive testing of the pretrained model. Key design decisions that enabled this performance are the choice of consistency objective, reduced initial sample variance, and the choice of preset chaining steps.

arxiv情報

著者	Aaditya Prasad,Kevin Lin,Jimmy Wu,Linqi Zhou,Jeannette Bohg
発行日	2024-06-28 21:56:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー