Language-Assisted 3D Scene Understanding

要約

点群データセットの規模と品質は、点群学習の進歩を制約します。
最近、マルチモーダル学習の発展に伴い、点群特徴学習を支援するために画像やテキストなどの他のモダリティからドメインに依存しない事前知識を組み込むことが有望な手段と考えられています。
既存の方法は、点群に対するマルチモーダル対比トレーニングと特徴抽出の有効性を実証しています。
ただし、ペアになったトリプレットデータの要件、教師あり特徴の冗長性と曖昧性、元の事前分布の破壊などの課題が残っています。
この論文では、LLM ベースのテキスト強化を通じて意味概念を強化する、点群特徴学習 (LAST-PCL) への言語支援アプローチを提案します。
統計に基づいたトレーニング不要の重要な特徴の選択により、テキストの事前分布を損なうことなく、冗長性の排除と特徴の次元削減を実現します。
さらに、点群に対するテキスト対比トレーニングの影響についても詳細に分析します。
広範な実験により、提案された方法が意味的に意味のある点群特徴を学習し、3D セマンティックセグメンテーション、3D オブジェクト検出、および 3D シーン分類タスクにおいて最先端または同等のパフォーマンスを達成することが検証されています。
ソースコードは https://github.com/yanmin-wu/LAST-PCL で入手できます。

要約(オリジナル)

The scale and quality of point cloud datasets constrain the advancement of point cloud learning. Recently, with the development of multi-modal learning, the incorporation of domain-agnostic prior knowledge from other modalities, such as images and text, to assist in point cloud feature learning has been considered a promising avenue. Existing methods have demonstrated the effectiveness of multi-modal contrastive training and feature distillation on point clouds. However, challenges remain, including the requirement for paired triplet data, redundancy and ambiguity in supervised features, and the disruption of the original priors. In this paper, we propose a language-assisted approach to point cloud feature learning (LAST-PCL), enriching semantic concepts through LLMs-based text enrichment. We achieve de-redundancy and feature dimensionality reduction without compromising textual priors by statistical-based and training-free significant feature selection. Furthermore, we also delve into an in-depth analysis of the impact of text contrastive training on the point cloud. Extensive experiments validate that the proposed method learns semantically meaningful point cloud features and achieves state-of-the-art or comparable performance in 3D semantic segmentation, 3D object detection, and 3D scene classification tasks. The source code is available at https://github.com/yanmin-wu/LAST-PCL.

arxiv情報

著者	Yanmin Wu,Qiankun Gao,Renrui Zhang,Jian Zhang
発行日	2023-12-18 18:54:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Language-Assisted 3D Scene Understanding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー