QwenGrasp: A Usage of Large Vision-Language Model for Target-Oriented Grasping

要約

言語制御による非構造化シーンでのターゲット指向の把握は、インテリジェントなロボットアームの把握に不可欠です。
ロボットアームが人間の言語を理解し、対応する掴み動作を実行できるかどうかは極めて重要な課題です。
この論文では、大規模な視覚言語モデルと 6-DoF 把握ニューラルネットワークを組み合わせた QwenGrasp と呼ばれる組み合わせモデルを提案します。
QwenGrasp は、テキスト言語の指示により、ターゲットオブジェクト上で 6-DoF の把握タスクを実行できます。
さまざまなケースに直面したときに QwenGrasp をテストするための 6 次元命令を使用した完全な実験を設計します。
この結果は、QwenGrasp が人間の意図を理解する能力に優れていることを示しています。
説明的な言葉による曖昧な指示や、方向情報を伴う指示に対しても、対象物を正確に把握することができます。
QwenGrasp が実行不可能な、または把握タスクに関係のない指示を受け入れた場合、私たちのアプローチはタスクの実行を一時停止し、人間に適切なフィードバックを提供する機能を備えており、安全性が向上します。
結論として、大規模なビジョン言語モデルの優れた能力を利用して、QwenGrasp をオープン言語環境に適用して、自由に入力された命令でターゲット指向の把握タスクを実行できます。

要約(オリジナル)

Target-oriented grasping in unstructured scenes with language control is essential for intelligent robot arm grasping. The ability for the robot arm to understand the human language and execute corresponding grasping actions is a pivotal challenge. In this paper, we propose a combination model called QwenGrasp which combines a large vision-language model with a 6-DoF grasp neural network. QwenGrasp is able to conduct a 6-DoF grasping task on the target object with textual language instruction. We design a complete experiment with six-dimension instructions to test the QwenGrasp when facing with different cases. The results show that QwenGrasp has a superior ability to comprehend the human intention. Even in the face of vague instructions with descriptive words or instructions with direction information, the target object can be grasped accurately. When QwenGrasp accepts the instruction which is not feasible or not relevant to the grasping task, our approach has the ability to suspend the task execution and provide a proper feedback to humans, improving the safety. In conclusion, with the great power of large vision-language model, QwenGrasp can be applied in the open language environment to conduct the target-oriented grasping task with freely input instructions.

arxiv情報

著者	Xinyu Chen,Jian Yang,Zonghan He,Haobin Yang,Qi Zhao,Yuhui Shi
発行日	2023-12-25 08:59:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

QwenGrasp: A Usage of Large Vision-Language Model for Target-Oriented Grasping

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー