Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance

要約

6-DoF の把握検出は、ロボットビジョンにおける基本的かつ困難な問題です。
これまでの研究は把握の安定性を確保することに重点を置いていましたが、多くの場合、自然言語を通じて伝達される人間の意図が考慮されておらず、複雑な 3D 環境におけるロボットとユーザー間の効果的なコラボレーションが妨げられていました。
この論文では、乱雑な点群における言語駆動型の 6-DoF 把握検出のための新しいアプローチを紹介します。
まず、Grasp-Anything-6D を紹介します。これは、100 万個の点群シーンと 2 億を超える言語関連の 3D 把握ポーズを含む、言語駆動の 6-DoF 把握検出タスク用の大規模データセットです。
さらに、新しい否定的なプロンプト指導学習戦略を組み込んだ新しい普及モデルを紹介します。
提案されたネガティブプロンプト戦略は、言語入力が与えられた場合に、望ましくないオブジェクトから遠ざけながら、検出プロセスを目的のオブジェクトに向けます。
私たちの方法では、人間が自然言語を使用して乱雑なシーンで目的のオブジェクトを把握するようにロボットに命令できるエンドツーエンドのフレームワークが可能になります。
集中的な実験結果は、ベンチマーク実験と現実世界のシナリオの両方で私たちの手法の有効性が他のベースラインを上回っていることを示しています。
さらに、実際のロボット応用におけるアプローチの実用性を実証します。
私たちのプロジェクトは https://airvlab.github.io/grasp-anything で入手できます。

要約(オリジナル)

6-DoF grasp detection has been a fundamental and challenging problem in robotic vision. While previous works have focused on ensuring grasp stability, they often do not consider human intention conveyed through natural language, hindering effective collaboration between robots and users in complex 3D environments. In this paper, we present a new approach for language-driven 6-DoF grasp detection in cluttered point clouds. We first introduce Grasp-Anything-6D, a large-scale dataset for the language-driven 6-DoF grasp detection task with 1M point cloud scenes and more than 200M language-associated 3D grasp poses. We further introduce a novel diffusion model that incorporates a new negative prompt guidance learning strategy. The proposed negative prompt strategy directs the detection process toward the desired object while steering away from unwanted ones given the language input. Our method enables an end-to-end framework where humans can command the robot to grasp desired objects in a cluttered scene using natural language. Intensive experimental results show the effectiveness of our method in both benchmarking experiments and real-world scenarios, surpassing other baselines. In addition, we demonstrate the practicality of our approach in real-world robotic applications. Our project is available at https://airvlab.github.io/grasp-anything.

arxiv情報

著者	Toan Nguyen,Minh Nhat Vu,Baoru Huang,An Vuong,Quan Vuong,Ngan Le,Thieu Vo,Anh Nguyen
発行日	2024-07-25 10:51:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー