Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models

要約

MAE が主導するマスク画像モデリングの圧倒的なトレンドにより、生成事前トレーニングは 2D ビジョンの基本モデルのパフォーマンスを向上させる顕著な可能性を示しています。
ただし、3D ビジョンでは、Transformer ベースのバックボーンへの過度の依存と点群の順序性のない性質により、生成事前トレーニングのさらなる開発が制限されています。
この論文では、あらゆる点群モデルに適応できる、新しい 3D から 2D への生成事前トレーニング方法を提案します。
事前トレーニングスキームとしてクロスアテンションメカニズムを介して、さまざまな指示されたポーズからビュー画像を生成することを提案します。
ビュー画像の生成には、対応する点群よりも正確な監視が行われるため、3D バックボーンが点群の幾何学的構造と立体関係をより詳細に理解できるようになります。
実験結果は、私たちが提案した 3D から 2D への生成事前トレーニングが以前の事前トレーニング方法よりも優れていることを証明しました。
私たちの方法は、アーキテクチャ指向のアプローチのパフォーマンスを向上させるのにも効果的で、ScanObjectNN 分類タスクと ShapeNetPart セグメンテーションタスクを微調整するときに最先端のパフォーマンスを実現します。
コードは https://github.com/wangzy22/TAP で入手できます。

要約(オリジナル)

With the overwhelming trend of mask image modeling led by MAE, generative pre-training has shown a remarkable potential to boost the performance of fundamental models in 2D vision. However, in 3D vision, the over-reliance on Transformer-based backbones and the unordered nature of point clouds have restricted the further development of generative pre-training. In this paper, we propose a novel 3D-to-2D generative pre-training method that is adaptable to any point cloud model. We propose to generate view images from different instructed poses via the cross-attention mechanism as the pre-training scheme. Generating view images has more precise supervision than its point cloud counterpart, thus assisting 3D backbones to have a finer comprehension of the geometrical structure and stereoscopic relations of the point cloud. Experimental results have proved the superiority of our proposed 3D-to-2D generative pre-training over previous pre-training methods. Our method is also effective in boosting the performance of architecture-oriented approaches, achieving state-of-the-art performance when fine-tuning on ScanObjectNN classification and ShapeNetPart segmentation tasks. Code is available at https://github.com/wangzy22/TAP.

arxiv情報

著者	Ziyi Wang,Xumin Yu,Yongming Rao,Jie Zhou,Jiwen Lu
発行日	2023-07-27 16:07:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー