CAGE: Causal Attention Enables Data-Efficient Generalizable Robotic Manipulation

要約

ロボット操作の一般化は、特にデモンストレーションが限られた新しい環境に拡張する場合、依然として重要な課題です。
この論文では、因果的注意メカニズムを統合することによってこれらの一般化の障壁を克服するように設計された新しいロボット操作ポリシーである CAGE を紹介します。
CAGE は、ビジョン基盤モデル DINOv2 の強力な特徴抽出機能を利用し、LoRA 微調整と組み合わせて堅牢な環境理解を実現します。
このポリシーはさらに、効果的なトークン圧縮のための因果的パーシーバーと、タスク固有のきめ細かい調整を強化するための注意メカニズムを備えた拡散ベースのアクション予測ヘッドを採用しています。
CAGE は、単一のトレーニング環境からわずか 50 のデモンストレーションを行うだけで、オブジェクト、背景、視点の多様な視覚的変化にわたる堅牢な一般化を実現します。
広範な実験により、CAGE がさまざまな操作タスクにおいて、特に分布が大きく変化した場合において、既存の最先端の RGB/RGB-D アプローチよりも大幅に優れていることが検証されています。
同様の環境では、CAGE によりタスク完了率が平均 42% 向上します。
すべてのベースラインは目に見えない環境でタスクを実行できませんでしたが、CAGE は平均 43% の完了率と 51% の成功率を達成し、現実世界の環境でのロボットの実用的な展開に向けて大きな一歩を踏み出しました。
プロジェクトの Web サイト：cage-policy.github.io。

要約(オリジナル)

Generalization in robotic manipulation remains a critical challenge, particularly when scaling to new environments with limited demonstrations. This paper introduces CAGE, a novel robotic manipulation policy designed to overcome these generalization barriers by integrating a causal attention mechanism. CAGE utilizes the powerful feature extraction capabilities of the vision foundation model DINOv2, combined with LoRA fine-tuning for robust environment understanding. The policy further employs a causal Perceiver for effective token compression and a diffusion-based action prediction head with attention mechanisms to enhance task-specific fine-grained conditioning. With as few as 50 demonstrations from a single training environment, CAGE achieves robust generalization across diverse visual changes in objects, backgrounds, and viewpoints. Extensive experiments validate that CAGE significantly outperforms existing state-of-the-art RGB/RGB-D approaches in various manipulation tasks, especially under large distribution shifts. In similar environments, CAGE offers an average of 42% increase in task completion rate. While all baselines fail to execute the task in unseen environments, CAGE manages to obtain a 43% completion rate and a 51% success rate in average, making a huge step towards practical deployment of robots in real-world settings. Project website: cage-policy.github.io.

arxiv情報

著者	Shangning Xia,Hongjie Fang,Cewu Lu,Hao-Shu Fang
発行日	2024-12-06 11:39:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CAGE: Causal Attention Enables Data-Efficient Generalizable Robotic Manipulation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー