CCDP: Composition of Conditional Diffusion Policies with Guided Sampling

要約

模倣学習は、明示的なモデル、シミュレーション、または詳細なタスク定義を必要とせずに、データから直接学習するための有望なアプローチを提供します。
推論中、アクションは学習された分布からサンプリングされ、ロボットで実行されます。
ただし、サンプリングされたアクションはさまざまな理由で失敗する可能性があり、成功したアクションが得られるまでサンプリングステップを繰り返すことは非効率的です。
この作業では、サンプリングの分布を改良して、以前に失敗したアクションを避けるためのサンプリング分布を改善する強化されたサンプリング戦略を提案します。
成功したデモンストレーションからのデータのみを使用することで、私たちの方法は、追加の探索的行動や高レベルのコントローラーを必要とせずに回復アクションを推測できることを実証します。
さらに、拡散モデル分解の概念を活用して、主要な問題（長期歴史が障害を管理するために必要な場合があります）を分解して、学習、データ収集、および推論において、より小さくて管理しやすいサブプロフェンスになり、システムがさまざまな障害カウントに適応できるようにします。
私たちのアプローチは、サンプリングスペースを動的に調整して、以前のサンプルが不足しているときに効率を改善する低レベルのコントローラーを生成します。
未知の方向、オブジェクト操作、ボタン検索シナリオを備えたドア開口部など、いくつかのタスクにわたってメソッドを検証し、私たちのアプローチが従来のベースラインよりも優れていることを示しています。

要約(オリジナル)

Imitation Learning offers a promising approach to learn directly from data without requiring explicit models, simulations, or detailed task definitions. During inference, actions are sampled from the learned distribution and executed on the robot. However, sampled actions may fail for various reasons, and simply repeating the sampling step until a successful action is obtained can be inefficient. In this work, we propose an enhanced sampling strategy that refines the sampling distribution to avoid previously unsuccessful actions. We demonstrate that by solely utilizing data from successful demonstrations, our method can infer recovery actions without the need for additional exploratory behavior or a high-level controller. Furthermore, we leverage the concept of diffusion model decomposition to break down the primary problem (which may require long-horizon history to manage failures) into multiple smaller, more manageable sub-problems in learning, data collection, and inference, thereby enabling the system to adapt to variable failure counts. Our approach yields a low-level controller that dynamically adjusts its sampling space to improve efficiency when prior samples fall short. We validate our method across several tasks, including door opening with unknown directions, object manipulation, and button-searching scenarios, demonstrating that our approach outperforms traditional baselines.

arxiv情報

著者	Amirreza Razmjoo,Sylvain Calinon,Michael Gienger,Fan Zhang
発行日	2025-03-19 16:24:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CCDP: Composition of Conditional Diffusion Policies with Guided Sampling

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー