Multi-view Vision-Prompt Fusion Network: Can 2D Pre-trained Model Boost 3D Point Cloud Data-scarce Learning?

要約

タイトル：Multi-view Vision-Prompt Fusion Network：2D事前学習モデルが3Dポイントクラウドのデータ不足学習を促進できるか？

要約：

– ポイントクラウドベースの3D深層モデルは、自律走行、家庭用ロボットなど、多くのアプリケーションで広く利用されている。
– 自然言語処理における最近のプロンプト学習に着想を得て、本研究では画期的なMulti-view Vision-Prompt Fusion Network（MvNet）を少数派3Dポイントクラウド分類のために提案している。
– MvNetは、オフシェルフ2D事前学習モデルを活用して少数派分類を実現する可能性を探求し、既存のベースラインモデルが大規模な注釈付き3Dポイントクラウドデータに対する過度の依存問題を軽減することができる。
– 具体的には、MvNetは3Dポイントクラウドを多視角画像特徴にエンコードし、多視角プロンプトフュージョンモジュールを開発して、異なる視点からの情報を効果的に融合し、3Dポイントクラウドデータと2D事前学習モデルの間の隔たりを埋める。
– MvNetは、大規模な事前学習された画像モデルに適した事前知識をより適切に説明するために、一連の2D画像プロンプトを導出することができる。
– ModelNet、ScanObjectNN、ShapeNetデータセットでの広範な実験により、MvNetは3D少数派ポイントクラウド画像分類の新しい最先端性能を達成した。
– この研究のソースコードは近日中に公開される予定です。

要約(オリジナル)

Point cloud based 3D deep model has wide applications in many applications such as autonomous driving, house robot, and so on. Inspired by the recent prompt learning in natural language processing, this work proposes a novel Multi-view Vision-Prompt Fusion Network (MvNet) for few-shot 3D point cloud classification. MvNet investigates the possibility of leveraging the off-the-shelf 2D pre-trained models to achieve the few-shot classification, which can alleviate the over-dependence issue of the existing baseline models towards the large-scale annotated 3D point cloud data. Specifically, MvNet first encodes a 3D point cloud into multi-view image features for a number of different views. Then, a novel multi-view prompt fusion module is developed to effectively fuse information from different views to bridge the gap between 3D point cloud data and 2D pre-trained models. A set of 2D image prompts can then be derived to better describe the suitable prior knowledge for a large-scale pre-trained image model for few-shot 3D point cloud classification. Extensive experiments on ModelNet, ScanObjectNN, and ShapeNet datasets demonstrate that MvNet achieves new state-of-the-art performance for 3D few-shot point cloud image classification. The source code of this work will be available soon.

arxiv情報

著者	Haoyang Peng,Baopu Li,Bo Zhang,Xin Chen,Tao Chen,Hongyuan Zhu
発行日	2023-04-20 11:39:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Multi-view Vision-Prompt Fusion Network: Can 2D Pre-trained Model Boost 3D Point Cloud Data-scarce Learning?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー