Model-Agnostic Hierarchical Attention for 3D Object Detection

要約

近年、3次元点群オブジェクトの検出において、汎用的なネットワークアーキテクチャである変換器が大きな成功を収めている。しかし，トランスフォーマーは階層構造を持たないため，異なるスケールの特徴を学習することが困難であり，局所的な特徴を抽出する能力に限界がある．このような制約により、異なる大きさの物体に対する性能が不均衡になり、小さい物体に対する性能が劣るという問題がある。本研究では、トランスフォーマーに基づく3次元検出器に対して、モジュール化された階層的な設計として、2つの新しい注意機構を提案する。異なるスケールでの特徴学習を可能にするために、単一スケールの入力特徴からマルチスケールトークンを構築するシンプルマルチスケールアテンションを提唱する。局所的な特徴の集約のために、我々は、すべてのバウンディングボックスの提案に対して適応的な注意範囲を持つサイズ適応型局所注意を提案する。我々の注意モジュールは両方ともモデルに依存しないネットワーク層であり、エンドツーエンドの学習のために既存の点群変換器にプラグインすることが可能である。我々は、2つの広く使われている屋内3D点群オブジェクト検出ベンチマークで我々の方法を評価する。我々の提案するモジュールを最新の変換器ベースの3D検出器にプラグインすることで、両方のベンチマークで以前の最良の結果を改善し、小さなオブジェクトで最大の改善マージンを得た。

要約(オリジナル)

Transformers as versatile network architectures have recently seen great success in 3D point cloud object detection. However, the lack of hierarchy in a plain transformer makes it difficult to learn features at different scales and restrains its ability to extract localized features. Such limitation makes them have imbalanced performance on objects of different sizes, with inferior performance on smaller ones. In this work, we propose two novel attention mechanisms as modularized hierarchical designs for transformer-based 3D detectors. To enable feature learning at different scales, we propose Simple Multi-Scale Attention that builds multi-scale tokens from a single-scale input feature. For localized feature aggregation, we propose Size-Adaptive Local Attention with adaptive attention ranges for every bounding box proposal. Both of our attention modules are model-agnostic network layers that can be plugged into existing point cloud transformers for end-to-end training. We evaluate our method on two widely used indoor 3D point cloud object detection benchmarks. By plugging our proposed modules into the state-of-the-art transformer-based 3D detector, we improve the previous best results on both benchmarks, with the largest improvement margin on small objects.

arxiv情報

著者	Manli Shu,Le Xue,Ning Yu,Roberto Martín-Martín,Juan Carlos Niebles,Caiming Xiong,Ran Xu
発行日	2023-01-06 18:52:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Model-Agnostic Hierarchical Attention for 3D Object Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー