Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation

要約

正確な 3D 環境マッピングはロボット工学において極めて重要です。
既存の方法は、トレーニング中に事前定義された概念に依存することが多く、セマンティックマップの生成に時間がかかります。
この論文では、RGB-D データを使用したリアルタイムのオープン語彙 3D マッピングとクエリ可能なシーン表現のための画期的なアプローチである Open-Fusion について説明します。
Open-Fusion は、オープンセットの意味理解のために事前トレーニング済みのビジョン言語基礎モデル (VLFM) の力を利用し、迅速な 3D シーン再構築のために Truncated Signed Distance Function (TSDF) を採用します。
VLFM を活用することで、領域ベースのエンベディングとそれに関連する信頼マップを抽出します。
これらは、強化されたハンガリーベースのフィーチャマッチングメカニズムを使用して、TSDF の 3D 知識と統合されます。
特に、Open-Fusion は、追加の 3D トレーニングを必要とせずに、オープン語彙に対して優れた注釈不要の 3D セグメンテーションを提供します。
ScanNet データセットの主要なゼロショット手法に対するベンチマークテストでは、Open-Fusion の優位性が強調されています。
さらに、リージョンベースの VLFM と TSDF の長所をシームレスに組み合わせ、オブジェクトの概念やオープンワールドのセマンティクスを含むリアルタイムの 3D シーンの理解を容易にします。
読者の皆様には、プロジェクトページ https://uark-aicv.github.io/OpenFusion でデモをご覧いただくことをお勧めします。

要約(オリジナル)

Precise 3D environmental mapping is pivotal in robotics. Existing methods often rely on predefined concepts during training or are time-intensive when generating semantic maps. This paper presents Open-Fusion, a groundbreaking approach for real-time open-vocabulary 3D mapping and queryable scene representation using RGB-D data. Open-Fusion harnesses the power of a pre-trained vision-language foundation model (VLFM) for open-set semantic comprehension and employs the Truncated Signed Distance Function (TSDF) for swift 3D scene reconstruction. By leveraging the VLFM, we extract region-based embeddings and their associated confidence maps. These are then integrated with 3D knowledge from TSDF using an enhanced Hungarian-based feature-matching mechanism. Notably, Open-Fusion delivers outstanding annotation-free 3D segmentation for open-vocabulary without necessitating additional 3D training. Benchmark tests on the ScanNet dataset against leading zero-shot methods highlight Open-Fusion’s superiority. Furthermore, it seamlessly combines the strengths of region-based VLFM and TSDF, facilitating real-time 3D scene comprehension that includes object concepts and open-world semantics. We encourage the readers to view the demos on our project page: https://uark-aicv.github.io/OpenFusion

arxiv情報

著者	Kashu Yamazaki,Taisei Hanyu,Khoa Vo,Thang Pham,Minh Tran,Gianfranco Doretto,Anh Nguyen,Ngan Le
発行日	2023-10-05 21:57:36+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー