Self-Supervised Pre-training for 3D Point Clouds via View-Specific Point-to-Image Translation

要約

過去数年間、言語および 2D 視覚コミュニティ内で自己教師あり表現学習が大きな成功を収め、普及しているのを目の当たりにしてきました。
ただし、そのような進歩は 3D 点群学習の分野に完全には移行されていません。
生成モデリングや対照学習の範囲に入る深い点群特徴抽出器用に設計された既存の事前トレーニングパラダイムとは異なり、この論文では、クロスの新しい自己教師あり口実タスクによって駆動される、翻訳的事前トレーニングフレームワーク、つまり PointVST を提案します。
– 3D 点群から対応するさまざまな形式の 2D レンダリング画像へのモーダル変換。
より具体的には、視点インジケーターの挿入を通じてビュー条件付きのポイントごとの埋め込みを推定することから始め、次にビュー固有のグローバルコードワードを適応的に集約します。これは、画像生成のための後続の 2D 畳み込み変換ヘッドにさらに供給できます。
さまざまなダウンストリームタスクシナリオに関する広範な実験評価により、当社の PointVST は、現在の最先端のアプローチと比較して、一貫した顕著なパフォーマンスの優位性と、満足のいくドメイン転送機能を示していることが実証されています。
私たちのコードは https://github.com/keeganhk/PointVST で公開されます。

要約(オリジナル)

The past few years have witnessed the great success and prevalence of self-supervised representation learning within the language and 2D vision communities. However, such advancements have not been fully migrated to the field of 3D point cloud learning. Different from existing pre-training paradigms designed for deep point cloud feature extractors that fall into the scope of generative modeling or contrastive learning, this paper proposes a translative pre-training framework, namely PointVST, driven by a novel self-supervised pretext task of cross-modal translation from 3D point clouds to their corresponding diverse forms of 2D rendered images. More specifically, we begin with deducing view-conditioned point-wise embeddings through the insertion of the viewpoint indicator, and then adaptively aggregate a view-specific global codeword, which can be further fed into subsequent 2D convolutional translation heads for image generation. Extensive experimental evaluations on various downstream task scenarios demonstrate that our PointVST shows consistent and prominent performance superiority over current state-of-the-art approaches as well as satisfactory domain transfer capability. Our code will be publicly available at https://github.com/keeganhk/PointVST.

arxiv情報

著者	Qijian Zhang,Junhui Hou
発行日	2023-07-28 16:42:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Self-Supervised Pre-training for 3D Point Clouds via View-Specific Point-to-Image Translation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー