FoundationGrasp: Generalizable Task-Oriented Grasping with Foundation Models

要約

タスク指向把握 (TOG) は、下流の操作タスクと構成的に互換性のあるオブジェクトの把握を合成する問題を指し、ツール操作に向けた最初のマイルストーンです。
認知プロセス中の意味的および幾何学的推論を担う 2 つの脳領域の活性化と同様に、オブジェクト、タスク、把握の間の複雑な関係をモデル化するには、オブジェクトとタスクに関する豊富な事前知識が必要です。
既存の方法は通常、事前知識を閉集合の範囲に限定しており、トレーニングセットからの新しいオブジェクトやタスクへの一般化をサポートできません。
このような制限に対処するために、私たちは、基礎モデルからのオープンエンドの知識を活用して一般化可能な TOG スキルを学習する基礎モデルベースの TOG フレームワークである FoundationGrasp を提案します。
提供された Language and Vision Augmented TaskGrasp (LaViA-TaskGrasp) データセットに対して包括的な実験が行われ、トレーニングセットからの新しいオブジェクトインスタンス、オブジェクトクラス、およびタスクに一般化する場合に、既存のメソッドに対する FoudationGrasp の優位性が実証されています。
さらに、FoudationGrasp の有効性は、7 自由度のロボットアームでの実際のロボットの把握および操作実験で検証されています。
コード、データ、付録、ビデオは https://sites.google.com/view/foundationgrasp で公開されています。

要約(オリジナル)

Task-oriented grasping (TOG), which refers to the problem of synthesizing grasps on an object that are configurationally compatible with the downstream manipulation task, is the first milestone towards tool manipulation. Analogous to the activation of two brain regions responsible for semantic and geometric reasoning during cognitive processes, modeling the complex relationship between objects, tasks, and grasps requires rich prior knowledge about objects and tasks. Existing methods typically limit the prior knowledge to a closed-set scope and cannot support the generalization to novel objects and tasks out of the training set. To address such a limitation, we propose FoundationGrasp, a foundation model-based TOG framework that leverages the open-ended knowledge from foundation models to learn generalizable TOG skills. Comprehensive experiments are conducted on the contributed Language and Vision Augmented TaskGrasp (LaViA-TaskGrasp) dataset, demonstrating the superiority of FoudationGrasp over existing methods when generalizing to novel object instances, object classes, and tasks out of the training set. Furthermore, the effectiveness of FoudationGrasp is validated in real-robot grasping and manipulation experiments on a 7 DoF robotic arm. Our code, data, appendix, and video are publicly available at https://sites.google.com/view/foundationgrasp.

arxiv情報

著者	Chao Tang,Dehao Huang,Wenlong Dong,Ruinian Xu,Hong Zhang
発行日	2024-04-16 08:56:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FoundationGrasp: Generalizable Task-Oriented Grasping with Foundation Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー