Point Transformer V2: Grouped Vector Attention and Partition-based Pooling

要約

Point Transformer は、3D 点群を理解するためのトランスフォーマーアーキテクチャを探求する先駆的な研究として、複数の非常に競争力のあるベンチマークで印象的な結果を達成しています。
この作業では、Point Transformer の制限を分析し、以前の作業の制限を克服する斬新な設計を備えた強力で効率的な Point Transformer V2 モデルを提案します。
特に、以前のバージョンのベクトルアテンションよりも効果的なグループベクトルアテンションを最初に提案します。
学習可能な重みエンコーディングとマルチヘッドアテンションの両方の利点を継承して、新しいグループ化された重みエンコーディングレイヤーを使用したグループ化されたベクトルアテンションの非常に効果的な実装を提示します。
また、追加の位置エンコーディング乗算器によって、注目のための位置情報を強化します。
さらに、より優れた空間的配置とより効率的なサンプリングを可能にする、斬新で軽量なパーティションベースのプーリング方法を設計します。
広範な実験により、私たちのモデルは前任者よりも優れたパフォーマンスを達成し、ScanNet v2 および S3DIS での 3D 点群セグメンテーションや ModelNet40 での 3D 点群分類など、いくつかの困難な 3D 点群理解ベンチマークで最先端を達成することが示されています。
コードは https://github.com/Gofinge/PointTransformerV2 で入手できます。

要約(オリジナル)

As a pioneering work exploring transformer architecture for 3D point cloud understanding, Point Transformer achieves impressive results on multiple highly competitive benchmarks. In this work, we analyze the limitations of the Point Transformer and propose our powerful and efficient Point Transformer V2 model with novel designs that overcome the limitations of previous work. In particular, we first propose group vector attention, which is more effective than the previous version of vector attention. Inheriting the advantages of both learnable weight encoding and multi-head attention, we present a highly effective implementation of grouped vector attention with a novel grouped weight encoding layer. We also strengthen the position information for attention by an additional position encoding multiplier. Furthermore, we design novel and lightweight partition-based pooling methods which enable better spatial alignment and more efficient sampling. Extensive experiments show that our model achieves better performance than its predecessor and achieves state-of-the-art on several challenging 3D point cloud understanding benchmarks, including 3D point cloud segmentation on ScanNet v2 and S3DIS and 3D point cloud classification on ModelNet40. Our code will be available at https://github.com/Gofinge/PointTransformerV2.

arxiv情報

著者	Xiaoyang Wu,Yixing Lao,Li Jiang,Xihui Liu,Hengshuang Zhao
発行日	2022-10-11 17:58:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Point Transformer V2: Grouped Vector Attention and Partition-based Pooling

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー