PointCLIP V2: Adapting CLIP for Powerful 3D Open-world Learning

要約

Contrastive Language-Image Pre-training (CLIP) は、2D 画像タスクで有望なオープンワールドパフォーマンスを示しましたが、3D 点群 (つまり PointCLIP) での転送容量はまだ満足のいくものではありません。
この作業では、強力な 3D オープンワールド学習器である PointCLIP V2 を提案し、3D 点群データで CLIP の可能性を完全に解き放ちます。
まず、現実的な形状投影モジュールを導入して、CLIP のビジュアルエンコーダー用のより現実的な深度マップを生成します。これは非常に効率的で、投影された点群と自然画像との間のドメインギャップを狭めます。
次に、大規模な言語モデルを活用して、以前の手作りのプロンプトではなく、CLIP のテキストエンコーダー用のよりわかりやすい 3D セマンティックプロンプトを自動的に設計します。
3D ドメインでトレーニングを導入することなく、私たちのアプローチはゼロショット 3D 分類の 3 つのデータセットで +42.90%、+40.44%、+28.75% の精度で PointCLIP を大幅に上回ります。
さらに、PointCLIP V2 は、少数ショット分類、ゼロショット部分セグメンテーション、およびゼロショット 3D オブジェクト検出に簡単な方法で拡張でき、3D オープンワールド学習に対する優れた一般化能力を実証します。
コードは https://github.com/yangyangyang127/PointCLIP_V2 で入手できます。

要約(オリジナル)

Contrastive Language-Image Pre-training (CLIP) has shown promising open-world performance on 2D image tasks, while its transferred capacity on 3D point clouds, i.e., PointCLIP, is still far from satisfactory. In this work, we propose PointCLIP V2, a powerful 3D open-world learner, to fully unleash the potential of CLIP on 3D point cloud data. First, we introduce a realistic shape projection module to generate more realistic depth maps for CLIP’s visual encoder, which is quite efficient and narrows the domain gap between projected point clouds with natural images. Second, we leverage large-scale language models to automatically design a more descriptive 3D-semantic prompt for CLIP’s textual encoder, instead of the previous hand-crafted one. Without introducing any training in 3D domains, our approach significantly surpasses PointCLIP by +42.90%, +40.44%, and +28.75% accuracy on three datasets for zero-shot 3D classification. Furthermore, PointCLIP V2 can be extended to few-shot classification, zero-shot part segmentation, and zero-shot 3D object detection in a simple manner, demonstrating our superior generalization ability for 3D open-world learning. Code will be available at https://github.com/yangyangyang127/PointCLIP_V2.

arxiv情報

著者	Xiangyang Zhu,Renrui Zhang,Bowei He,Ziyao Zeng,Shanghang Zhang,Peng Gao
発行日	2022-11-21 17:52:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PointCLIP V2: Adapting CLIP for Powerful 3D Open-world Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー