SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images

要約

言語誘導型ロボット把持は、ロボットが特定の物体を把持するよう人間の言語を使って指示する、急速に発展している分野である。しかし、既存の手法では、高密度のカメラビューに依存することが多く、シーンの迅速な更新に苦労しており、変化しやすい環境での有効性が制限されている。これに対して我々は、スパースビューのRGB画像で効率的に動作し、シーンの更新を高速に処理する、新しいオープン語彙ロボット把持システムであるスパースグラップを提案する。我々のシステムは、ロボット学習における既存のコンピュータビジョンモジュールを基に構築され、大幅に強化される。具体的には、SparseGraspはDUSt3Rを利用して、3Dガウス散布（3DGS）の初期化として高密度の点群を生成し、疎な監視下でも高い忠実度を維持する。重要な点は、SparseGraspは、最近のビジョン基礎モデルからの意味認識を組み込んでいることである。処理効率をさらに向上させるために、主成分分析（PCA）を再利用し、2Dモデルから特徴を圧縮する。さらに、シーンの迅速な更新を保証する新しいレンダリング＆比較戦略を導入することで、変化しやすい環境での多回転把持を可能にする。実験の結果、SparseGraspは、速度と適応性の両方において、最新の手法を大幅に上回り、変化しやすい環境における多ターン把持のためのロバストなソリューションを提供することが示された。

要約(オリジナル)

Language-guided robotic grasping is a rapidly advancing field where robots are instructed using human language to grasp specific objects. However, existing methods often depend on dense camera views and struggle to quickly update scenes, limiting their effectiveness in changeable environments. In contrast, we propose SparseGrasp, a novel open-vocabulary robotic grasping system that operates efficiently with sparse-view RGB images and handles scene updates fastly. Our system builds upon and significantly enhances existing computer vision modules in robotic learning. Specifically, SparseGrasp utilizes DUSt3R to generate a dense point cloud as the initialization for 3D Gaussian Splatting (3DGS), maintaining high fidelity even under sparse supervision. Importantly, SparseGrasp incorporates semantic awareness from recent vision foundation models. To further improve processing efficiency, we repurpose Principal Component Analysis (PCA) to compress features from 2D models. Additionally, we introduce a novel render-and-compare strategy that ensures rapid scene updates, enabling multi-turn grasping in changeable environments. Experimental results show that SparseGrasp significantly outperforms state-of-the-art methods in terms of both speed and adaptability, providing a robust solution for multi-turn grasping in changeable environment.

arxiv情報

著者	Junqiu Yu,Xinlin Ren,Yongchong Gu,Haitao Lin,Tianyu Wang,Yi Zhu,Hang Xu,Yu-Gang Jiang,Xiangyang Xue,Yanwei Fu
発行日	2024-12-03 03:56:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー