OpenSU3D: Open World 3D Scene Understanding using Foundation Models

要約

この論文では、オープンセットのインスタンスレベルの 3D シーン表現を構築し、3D 環境のオープンワールドの理解を促進するための、新規でスケーラブルなアプローチを紹介します。
既存の方法では、事前に構築された 3D シーンが必要であり、ポイントごとの特徴ベクトル学習によるスケーラビリティの問題に直面しており、複雑なクエリでの有効性が制限されています。
私たちの手法は、2D 基礎モデルを使用してインスタンスレベルの 3D シーン表現を段階的に構築し、マスク、特徴ベクトル、名前、キャプションなどのインスタンスレベルの詳細を効率的に集約することで、これらの制限を克服します。
特徴ベクトルの融合スキームを導入して、複雑なクエリに対する文脈上の知識とパフォーマンスを強化します。
さらに、堅牢な自動アノテーションと空間推論タスクのための大規模な言語モデルを調査します。
ScanNet およびレプリカデータセットからの複数のシーンで提案されたアプローチを評価し、オープンワールド 3D シーンの理解における現在の最先端の方法を超えるゼロショット汎化機能を実証します。

要約(オリジナル)

In this paper, we present a novel, scalable approach for constructing open set, instance-level 3D scene representations, advancing open world understanding of 3D environments. Existing methods require pre-constructed 3D scenes and face scalability issues due to per-point feature vector learning, limiting their efficacy with complex queries. Our method overcomes these limitations by incrementally building instance-level 3D scene representations using 2D foundation models, efficiently aggregating instance-level details such as masks, feature vectors, names, and captions. We introduce fusion schemes for feature vectors to enhance their contextual knowledge and performance on complex queries. Additionally, we explore large language models for robust automatic annotation and spatial reasoning tasks. We evaluate our proposed approach on multiple scenes from ScanNet and Replica datasets demonstrating zero-shot generalization capabilities, exceeding current state-of-the-art methods in open world 3D scene understanding.

arxiv情報

著者	Rafay Mohiuddin,Sai Manoj Prakhya,Fiona Collins,Ziyuan Liu,André Borrmann
発行日	2024-07-19 13:01:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

OpenSU3D: Open World 3D Scene Understanding using Foundation Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー