RTAGrasp: Learning Task-Oriented Grasping from Human Videos via Retrieval, Transfer, and Alignment

要約

タスク指向把握 (TOG) は、ロボットが操作タスクを実行するために非常に重要であり、TOG の位置と方向を決定する必要があります。
既存の方法は、コストのかかる手動の TOG アノテーションに依存するか、人間のデモンストレーションから大まかな把握位置または領域のみを抽出するかのいずれかで、現実世界のアプリケーションでの実用性が制限されています。
これらの制限に対処するために、人間の把握戦略にヒントを得た取得、転送、および位置合わせフレームワークである RTAGrasp を導入します。
具体的には、私たちのアプローチでは、まず人間の掴みデモビデオからロボットのメモリを簡単に構築し、TOGの位置と方向の制約の両方を抽出します。
次に、タスクの指示とターゲットオブジェクトの視覚的観察が与えられると、RTAGrasp はそのメモリから最も類似した人間の把握体験を取得し、ビジョン基盤モデルのセマンティックマッチング機能を活用して、トレーニング不要の方法で TOG 制約をターゲットオブジェクトに転送します。
。
最後に、RTAGrasp は、転送された TOG 制約をロボットの実行アクションに合わせて調整します。
公開 TOG ベンチマークである TaskGrasp データセットの評価では、既存のベースライン手法と比較して、可視オブジェクトカテゴリと未可視オブジェクトカテゴリの両方で RTAGrasp の競合パフォーマンスが示されています。
実際の実験では、ロボットアームでのその有効性がさらに検証されています。
コード、付録、ビデオは \url{https://sites.google.com/view/rtagrasp/home} でご覧いただけます。

要約(オリジナル)

Task-oriented grasping (TOG) is crucial for robots to accomplish manipulation tasks, requiring the determination of TOG positions and directions. Existing methods either rely on costly manual TOG annotations or only extract coarse grasping positions or regions from human demonstrations, limiting their practicality in real-world applications. To address these limitations, we introduce RTAGrasp, a Retrieval, Transfer, and Alignment framework inspired by human grasping strategies. Specifically, our approach first effortlessly constructs a robot memory from human grasping demonstration videos, extracting both TOG position and direction constraints. Then, given a task instruction and a visual observation of the target object, RTAGrasp retrieves the most similar human grasping experience from its memory and leverages semantic matching capabilities of vision foundation models to transfer the TOG constraints to the target object in a training-free manner. Finally, RTAGrasp aligns the transferred TOG constraints with the robot’s action for execution. Evaluations on the public TOG benchmark, TaskGrasp dataset, show the competitive performance of RTAGrasp on both seen and unseen object categories compared to existing baseline methods. Real-world experiments further validate its effectiveness on a robotic arm. Our code, appendix, and video are available at \url{https://sites.google.com/view/rtagrasp/home}.

arxiv情報

著者	Wenlong Dong,Dehao Huang,Jiangshan Liu,Chao Tang,Hong Zhang
発行日	2024-09-24 12:32:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

RTAGrasp: Learning Task-Oriented Grasping from Human Videos via Retrieval, Transfer, and Alignment

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー