AYDIV: Adaptable Yielding 3D Object Detection via Integrated Contextual Vision Transformer

要約

LiDAR とカメラのデータを組み合わせることで、自動運転システムにおける近距離物体の検出を強化できる可能性が示されています。
しかし、LiDAR のまばらなデータとカメラの密な解像度とのコントラストにより、この融合では長距離検出が困難になります。
さらに、2 つのデータ表現の不一致により、融合方法がさらに複雑になります。
データの不一致の中でも長距離検出を強化するために特別に設計された 3 フェーズアライメントプロセスを統合した新しいフレームワークである AYDIV を紹介します。
AYDIV は、カメラの特徴の抽出を改善し、大規模なパターンのより深い理解を提供する Global Contextual Fusion Alignment Transformer (GCFAT) で構成されています。
LiDAR とカメラの詳細の融合を微調整する Sparse Fused Feature Attendant (SFFA)。
包括的な空間データ融合のためのボリュームグリッドアテンション (VGA)。
Waymo Open Dataset (WOD) 上で mAPH 値 (L2 難易度) が 1.24% 向上し、Argoverse2 データセット上で AP 値が 7.40% 向上した AYDIV のパフォーマンスは、他の既存の融合ベースの手法と比較してその有効性を示しています。
私たちのコードは https://github.com/sanjay-810/AYDIV2 で公開されています。

要約(オリジナル)

Combining LiDAR and camera data has shown potential in enhancing short-distance object detection in autonomous driving systems. Yet, the fusion encounters difficulties with extended distance detection due to the contrast between LiDAR’s sparse data and the dense resolution of cameras. Besides, discrepancies in the two data representations further complicate fusion methods. We introduce AYDIV, a novel framework integrating a tri-phase alignment process specifically designed to enhance long-distance detection even amidst data discrepancies. AYDIV consists of the Global Contextual Fusion Alignment Transformer (GCFAT), which improves the extraction of camera features and provides a deeper understanding of large-scale patterns; the Sparse Fused Feature Attention (SFFA), which fine-tunes the fusion of LiDAR and camera details; and the Volumetric Grid Attention (VGA) for a comprehensive spatial data fusion. AYDIV’s performance on the Waymo Open Dataset (WOD) with an improvement of 1.24% in mAPH value(L2 difficulty) and the Argoverse2 Dataset with a performance improvement of 7.40% in AP value demonstrates its efficacy in comparison to other existing fusion-based methods. Our code is publicly available at https://github.com/sanjay-810/AYDIV2

arxiv情報

著者	Tanmoy Dam,Sanjay Bhargav Dharavath,Sameer Alam,Nimrod Lilith,Supriyo Chakraborty,Mir Feroskhan
発行日	2024-02-12 14:40:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AYDIV: Adaptable Yielding 3D Object Detection via Integrated Contextual Vision Transformer

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー