VIHE: Virtual In-Hand Eye Transformer for 3D Robotic Manipulation

要約

この研究では、アクションを意識したビューレンダリングを通じて 3D 操作機能を強化するように設計された新しい方法である Virtual In-Hand Eye Transformer (VIHE) を紹介します。
VIHE は、初期の段階でのアクション予測から提示されたレンダリングされたビューを条件付けすることにより、複数の段階で自己回帰的にアクションを洗練します。
これらの仮想の手のビューは、特にペグの挿入などの難しい高精度のタスクにおいて、手の正しい姿勢を効果的に認識するための強力な誘導バイアスを提供します。
RLBench シミュレート環境での 18 の操作タスクで、VIHE は、1 件あたり 100 回のデモンストレーションを使用して、既存の最先端モデルと比較して 65% から 77% に増加し、絶対的な 12% の改善により新しい最先端を達成しました。
タスク。
現実世界のシナリオでは、VIHE はほんの数回のデモンストレーションで操作タスクを学習でき、その実践的な有用性が強調されます。
ビデオとコードの実装は、プロジェクトサイト https://vihe-3d.github.io でご覧いただけます。

要約(オリジナル)

In this work, we introduce the Virtual In-Hand Eye Transformer (VIHE), a novel method designed to enhance 3D manipulation capabilities through action-aware view rendering. VIHE autoregressively refines actions in multiple stages by conditioning on rendered views posed from action predictions in the earlier stages. These virtual in-hand views provide a strong inductive bias for effectively recognizing the correct pose for the hand, especially for challenging high-precision tasks such as peg insertion. On 18 manipulation tasks in RLBench simulated environments, VIHE achieves a new state-of-the-art, with a 12% absolute improvement, increasing from 65% to 77% over the existing state-of-the-art model using 100 demonstrations per task. In real-world scenarios, VIHE can learn manipulation tasks with just a handful of demonstrations, highlighting its practical utility. Videos and code implementation can be found at our project site: https://vihe-3d.github.io.

arxiv情報

著者	Weiyao Wang,Yutian Lei,Gregory D. Hage,Liangjun Zhang
発行日	2024-03-18 04:26:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

VIHE: Virtual In-Hand Eye Transformer for 3D Robotic Manipulation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー