H$^{\mathbf{3}}$DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning

要約

視覚運動の政策学習は、ロボット操作の大きな進歩を目撃しており、最近のアプローチは、アクション分布をモデル化するために生成モデルに主に依存しています。
ただし、これらの方法は、視覚的知覚とアクション予測の間の重要な結合を見落としていることがよくあります。
この作業では、$ \ textbf {triply-hierarchical拡散ポリシー}〜（\ textbf {h $^{\ mathbf {3}} $ dp}）$を紹介します。
h $^{3} $ dpには、階層の$ \ mathbf {3} $レベルのレベルが含まれています：（1）深度情報に基づいてRGB-D観測を編成する深度認識入力層。
（2）さまざまなレベルの粒度でセマンティック機能をエンコードするマルチスケールの視覚表現。
（3）粗から調整のアクションの生成を対応する視覚的特徴と整列させる階層的に条件付けられた拡散プロセス。
広範な実験では、H $^{3} $ dpが$ \ mathbf {+27.5 \％} $ $ \ mathbf {44} $ $シミュレーションタスク全体でベースライン上の平均相対的改善をもたらし、$ \ mathbf {4} $課題の実質的なマニピュレーションタスクで優れたパフォーマンスを達成することを示しています。
プロジェクトページ：https：//lyy-iiis.github.io/h3dp/。

要約(オリジナル)

Visuomotor policy learning has witnessed substantial progress in robotic manipulation, with recent approaches predominantly relying on generative models to model the action distribution. However, these methods often overlook the critical coupling between visual perception and action prediction. In this work, we introduce $\textbf{Triply-Hierarchical Diffusion Policy}~(\textbf{H$^{\mathbf{3}}$DP})$, a novel visuomotor learning framework that explicitly incorporates hierarchical structures to strengthen the integration between visual features and action generation. H$^{3}$DP contains $\mathbf{3}$ levels of hierarchy: (1) depth-aware input layering that organizes RGB-D observations based on depth information; (2) multi-scale visual representations that encode semantic features at varying levels of granularity; and (3) a hierarchically conditioned diffusion process that aligns the generation of coarse-to-fine actions with corresponding visual features. Extensive experiments demonstrate that H$^{3}$DP yields a $\mathbf{+27.5\%}$ average relative improvement over baselines across $\mathbf{44}$ simulation tasks and achieves superior performance in $\mathbf{4}$ challenging bimanual real-world manipulation tasks. Project Page: https://lyy-iiis.github.io/h3dp/.

arxiv情報

著者	Yiyang Lu,Yufeng Tian,Zhecheng Yuan,Xianbang Wang,Pu Hua,Zhengrong Xue,Huazhe Xu
発行日	2025-05-12 17:59:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

H$^{\mathbf{3}}$DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー