Language-Assisted 3D Feature Learning for Semantic Scene Understanding

要約

記述的な 3D 機能を学習することは、多様なオブジェクトと複雑な構造を持つ 3D シーンを理解するために不可欠です。
ただし、重要な幾何学的属性とシーンのコンテキストが、エンドツーエンドのトレーニング済み 3D シーン理解ネットワークで十分に強調されるかどうかは、通常は不明です。
重要な幾何学的属性とシーンコンテキストに向けて 3D フィーチャの学習を導くために、テキストによるシーンの説明のヘルプを調べます。
3D シーンと組み合わせたいくつかの自由形式の説明が与えられると、オブジェクトの関係とオブジェクトの属性に関する知識が抽出されます。
次に、3 つの分類ベースの補助タスクを通じて、知識を 3D 特徴学習に注入します。
この言語支援トレーニングは、最新のオブジェクト検出およびインスタンスセグメンテーションメソッドと組み合わせて、特にラベルが不足している体制で、3D セマンティックシーンの理解を促進できます。
さらに、言語支援で学習した 3D 機能は、言語機能との整合性が向上し、さまざまな 3D 言語のマルチモーダルタスクに役立ちます。
3D のみのタスクと 3D 言語のタスクのいくつかのベンチマークに関する実験は、言語支援による 3D 機能学習の有効性を示しています。
コードは https://github.com/Asterisci/Language-Assisted-3D で入手できます。

要約(オリジナル)

Learning descriptive 3D features is crucial for understanding 3D scenes with diverse objects and complex structures. However, it is usually unknown whether important geometric attributes and scene context obtain enough emphasis in an end-to-end trained 3D scene understanding network. To guide 3D feature learning toward important geometric attributes and scene context, we explore the help of textual scene descriptions. Given some free-form descriptions paired with 3D scenes, we extract the knowledge regarding the object relationships and object attributes. We then inject the knowledge to 3D feature learning through three classification-based auxiliary tasks. This language-assisted training can be combined with modern object detection and instance segmentation methods to promote 3D semantic scene understanding, especially in a label-deficient regime. Moreover, the 3D feature learned with language assistance is better aligned with the language features, which can benefit various 3D-language multimodal tasks. Experiments on several benchmarks of 3D-only and 3D-language tasks demonstrate the effectiveness of our language-assisted 3D feature learning. Code is available at https://github.com/Asterisci/Language-Assisted-3D.

arxiv情報

著者	Junbo Zhang,Guofan Fan,Guanghan Wang,Zhengyuan Su,Kaisheng Ma,Li Yi
発行日	2022-11-25 13:21:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Language-Assisted 3D Feature Learning for Semantic Scene Understanding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー