LATFormer: Locality-Aware Point-View Fusion Transformer for 3D Shape Recognition

要約

最近、画像、ボクセル、点群などのさまざまなデータ形式での深層学習モデルの進歩により、3D 形状の理解が大幅に進歩しました。
その中で、点群と多視点画像は 3D オブジェクトの 2 つの相補的なモダリティであり、両者を融合することによる学習表現がかなり効果的であることが証明されています。
従来の研究は通常、2つのモダリティの全体的な特徴を活用することに焦点を当てていますが、本明細書では、「融合する場所」をモデル化することによって、より識別的な特徴を導き出すことができると主張します。
これを調査するために、3D 形状の検索と分類のための新しい Locality-Aware Point-View Fusion Transformer (LATFormer) を提案します。
LATFormer のコアコンポーネントは、共起スコアに基づいて 2 つのモダリティにわたる相関領域の局所特徴を統合する Locality-Aware Fusion (LAF) という名前のモジュールです。
さらに、低い値のスコアをフィルタリングして、顕著な局所共起領域を取得し、融合プロセスの冗長性を減らすことを提案します。
LATFormer では、LAF モジュールを利用して 2 つのモダリティのマルチスケール機能を双方向および階層的に融合し、より有益な機能を取得します。
3D オブジェクトの検索と分類をカバーする 4 つの一般的な 3D 形状ベンチマークに関する包括的な実験により、その有効性が検証されました。

要約(オリジナル)

Recently, 3D shape understanding has achieved significant progress due to the advances of deep learning models on various data formats like images, voxels, and point clouds. Among them, point clouds and multi-view images are two complementary modalities of 3D objects and learning representations by fusing both of them has been proven to be fairly effective. While prior works typically focus on exploiting global features of the two modalities, herein we argue that more discriminative features can be derived by modeling “where to fuse”. To investigate this, we propose a novel Locality-Aware Point-View Fusion Transformer (LATFormer) for 3D shape retrieval and classification. The core component of LATFormer is a module named Locality-Aware Fusion (LAF) which integrates the local features of correlated regions across the two modalities based on the co-occurrence scores. We further propose to filter out scores with low values to obtain salient local co-occurring regions, which reduces redundancy for the fusion process. In our LATFormer, we utilize the LAF module to fuse the multi-scale features of the two modalities both bidirectionally and hierarchically to obtain more informative features. Comprehensive experiments on four popular 3D shape benchmarks covering 3D object retrieval and classification validate its effectiveness.

arxiv情報

著者	Xinwei He,Silin Cheng,Dingkang Liang,Song Bai,Xi Wang,Yingying Zhu
発行日	2023-08-25 15:02:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LATFormer: Locality-Aware Point-View Fusion Transformer for 3D Shape Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー