OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding

要約

オープンボキャブラリー 3D シーン理解 (OV-3D) は、クローズドオブジェクトクラスを超えて新しいオブジェクトを位置特定し、分類することを目的としています。
ただし、既存のアプローチとベンチマークは主に、オブジェクトクラスのコンテキスト内での未解決の語彙の問題に焦点を当てており、モデルが 3D シーンをどの程度理解しているかを総合的に評価するには不十分です。
このペーパーでは、オブジェクトクラスを超えたオープンボキャブラリーの問題を調査するために、一般化オープンボキャブラリー 3D シーン理解 (GOV-3D) と呼ばれる、より困難なタスクを紹介します。
これには、きめ細かいオブジェクト固有の属性の言語クエリとして表現される、オープンで多様な一般化された知識のセットが含まれます。
この目的を達成するために、私たちは OpenScan という名前の新しいベンチマークを提供します。このベンチマークは、アフォーダンス、プロパティ、マテリアルなどを含む 8 つの代表的な言語的側面にわたる 3D オブジェクト属性で構成されています。
さらに、OpenScan ベンチマークで最先端の OV-3D メソッドを評価したところ、これらのメソッドは GOV-3D タスクの抽象語彙を理解するのに苦労していることがわかりました。これは、オブジェクトクラスをスケールアップするだけでは対処できない課題です。
トレーニング。
私たちは既存の方法論の限界を強調し、特定された欠点を克服するための有望な方向性を探ります。
データとコードは https://github.com/YoujunZhao/OpenScan で入手できます。

要約(オリジナル)

Open-vocabulary 3D scene understanding (OV-3D) aims to localize and classify novel objects beyond the closed object classes. However, existing approaches and benchmarks primarily focus on the open vocabulary problem within the context of object classes, which is insufficient to provide a holistic evaluation to what extent a model understands the 3D scene. In this paper, we introduce a more challenging task called Generalized Open-Vocabulary 3D Scene Understanding (GOV-3D) to explore the open vocabulary problem beyond object classes. It encompasses an open and diverse set of generalized knowledge, expressed as linguistic queries of fine-grained and object-specific attributes. To this end, we contribute a new benchmark named OpenScan, which consists of 3D object attributes across eight representative linguistic aspects, including affordance, property, material, and more. We further evaluate state-of-the-art OV-3D methods on our OpenScan benchmark, and discover that these methods struggle to comprehend the abstract vocabularies of the GOV-3D task, a challenge that cannot be addressed by simply scaling up object classes during training. We highlight the limitations of existing methodologies and explore a promising direction to overcome the identified shortcomings. Data and code are available at https://github.com/YoujunZhao/OpenScan

arxiv情報

著者	Youjun Zhao,Jiaying Lin,Shuquan Ye,Qianshi Pang,Rynson W. H. Lau
発行日	2024-08-20 17:31:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー