Multi-view Vision-Prompt Fusion Network: Can 2D Pre-trained Model Boost 3D Point Cloud Data-scarce Learning?

要約

点群ベースの3次元ディープモデルは、自律走行、家型ロボットなど、多くのアプリケーションに広く応用されている。近年の自然言語処理におけるプロンプト学習に触発され、本研究では、少数ショットの3次元点群分類のための新しいマルチビュービジョン-プロンプト融合ネットワーク（MvNet）を提案する。MvNetは、数ショットの分類を達成するために、予め訓練された既製の2Dモデルを活用する可能性を検討し、大規模な注釈付き3D点群データに対する既存のベースラインモデルの過度な依存の問題を緩和することができる。具体的には、MvNetはまず3D点群を複数の異なるビューの画像特徴にエンコードする。次に、3D点群データと2D事前学習モデルとのギャップを埋めるために、異なるビューからの情報を効果的に融合する新しいマルチビュープロンプト融合モジュールを開発する。そして、数ショットの3D点群分類のための大規模な事前学習済み画像モデルのための適切な事前知識をより良く記述するために、2D画像プロンプトのセットを導出することができる。ModelNet、ScanObjectNN、およびShapeNetデータセットでの広範な実験により、MvNetが3D少数ショット点群画像分類において最先端の性能を達成することが実証された。本研究のソースコードは近日公開予定です。

要約(オリジナル)

Point cloud based 3D deep model has wide applications in many applications such as autonomous driving, house robot, and so on. Inspired by the recent prompt learning in natural language processing, this work proposes a novel Multi-view Vision-Prompt Fusion Network (MvNet) for few-shot 3D point cloud classification. MvNet investigates the possibility of leveraging the off-the-shelf 2D pre-trained models to achieve the few-shot classification, which can alleviate the over-dependence issue of the existing baseline models towards the large-scale annotated 3D point cloud data. Specifically, MvNet first encodes a 3D point cloud into multi-view image features for a number of different views. Then, a novel multi-view prompt fusion module is developed to effectively fuse information from different views to bridge the gap between 3D point cloud data and 2D pre-trained models. A set of 2D image prompts can then be derived to better describe the suitable prior knowledge for a large-scale pre-trained image model for few-shot 3D point cloud classification. Extensive experiments on ModelNet, ScanObjectNN, and ShapeNet datasets demonstrate that MvNet achieves new state-of-the-art performance for 3D few-shot point cloud image classification. The source code of this work will be available soon.

arxiv情報

著者	Haoyang Peng,Baopu Li,Bo Zhang,Xin Chen,Tao Chen,Hongyuan Zhu
発行日	2023-08-04 09:19:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Multi-view Vision-Prompt Fusion Network: Can 2D Pre-trained Model Boost 3D Point Cloud Data-scarce Learning?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー