Multi-GraspLLM: A Multimodal LLM for Multi-Hand Semantic Guided Grasp Generation

要約

マルチハンド意味論的把握生成は、自然言語命令に基づいて、さまざまなロボットハンドに対して実行可能で意味的に適切な把握ポーズを生成することを目的としています。
このタスクは非常に価値がありますが、ロボットハンドと物体の接触を詳細に記述したマルチハンド把握データセットが不足しているため、依然として長年の困難なタスクです。
この論文では、自動接触アノテーションを備えた初の大規模な複数の手による把握データセットである Multi-GraspSet を紹介します。
Multi-GraspSet に基づいて、統一された言語ガイドによる把握生成フレームワークである Multi-GraspLLM を提案します。
大規模言語モデル (LLM) を活用して可変長シーケンスを処理し、単一の統合アーキテクチャでさまざまなロボットハンドの把握ポーズを生成します。
Multi-GraspLLM は、まず、エンコードされた点群フィーチャとテキストフィーチャを統一されたセマンティック空間に調整します。
次に、把握ビントークンを生成します。このトークンは、その後、ハンド認識線形マッピングを介して各ロボットハンドの把握ポーズに変換されます。
実験結果は、私たちのアプローチが Multi-GraspSet 上の既存の方法よりも大幅に優れていることを示しています。
詳細については、プロジェクトページ https://multi-graspllm.github.io をご覧ください。

要約(オリジナル)

Multi-hand semantic grasp generation aims to generate feasible and semantically appropriate grasp poses for different robotic hands based on natural language instructions. Although the task is highly valuable, due to the lack of multi-hand grasp datasets with fine-grained contact description between robotic hands and objects, it is still a long-standing difficult task. In this paper, we present Multi-GraspSet, the first large-scale multi-hand grasp dataset with automatically contact annotations. Based on Multi-GraspSet, we propose Multi-GraspLLM, a unified language-guided grasp generation framework. It leverages large language models (LLM) to handle variable-length sequences, generating grasp poses for diverse robotic hands in a single unified architecture. Multi-GraspLLM first aligns the encoded point cloud features and text features into a unified semantic space. It then generates grasp bin tokens which are subsequently converted into grasp pose for each robotic hand via hand-aware linear mapping. The experimental results demonstrate that our approach significantly outperforms existing methods on Multi-GraspSet. More information can be found on our project page https://multi-graspllm.github.io.

arxiv情報

著者	Haosheng Li,Weixin Mao,Weipeng Deng,Chenyu Meng,Haoqiang Fan,Tiancai Wang,Ping Tan,Hongan Wang,Xiaoming Deng
発行日	2024-12-11 15:33:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multi-GraspLLM: A Multimodal LLM for Multi-Hand Semantic Guided Grasp Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー