SSD-MonoDTR: Supervised Scale-constrained Deformable Transformer for Monocular 3D Object Detection

要約

最近、1枚の2D画像から3D属性を予測する単眼3D物体検出において、トランスフォーマーベースの手法が優れた性能を発揮している。既存のトランスフォーマーに基づく手法の多くは、視覚表現と深度表現を活用して物体上の貴重なクエリポイントを探索し、学習したクエリの質は検出精度に大きな影響を与える。残念ながら、トランスフォーマーにおける既存の教師なし注意機構は、特に硬い物体において、不正確な受容野のために低品質のクエリ特徴を生成しがちである。この問題に取り組むため、本論文では、単眼3D物体検出のための新しい“教師ありスケール制約付き変形性注意”（SSDA）を提案する。具体的には、SSDAは異なるスケールを持つ複数のマスクをプリセットし、各クエリの局所的な特徴を予測するために深度と視覚的特徴を利用する。スケール制約を与えることで、SSDAはクエリの正確な受容野を予測し、ロバストなクエリ特徴生成をサポートすることができる。さらに、SSDAは、スケール予測を監視するために、重み付けスケールマッチング（WSM）損失を割り当て、教師なし注意メカニズムに比べ、より信頼性の高い結果を提示することができる。KITTIを用いた広範な実験により、SSDAが特に中程度の硬い物体の検出精度を大幅に向上させ、既存のアプローチと比較してSOTA性能を達成することが実証された。コードは https://github.com/mikasa3lili/SSD-MonoDETR で公開される予定です。

要約(オリジナル)

Transformer-based methods have demonstrated superior performance for monocular 3D object detection recently, which predicts 3D attributes from a single 2D image. Most existing transformer-based methods leverage visual and depth representations to explore valuable query points on objects, and the quality of the learned queries has a great impact on detection accuracy. Unfortunately, existing unsupervised attention mechanisms in transformer are prone to generate low-quality query features due to inaccurate receptive fields, especially on hard objects. To tackle this problem, this paper proposes a novel “Supervised Scale-constrained Deformable Attention” (SSDA) for monocular 3D object detection. Specifically, SSDA presets several masks with different scales and utilizes depth and visual features to predict the local feature for each query. Imposing the scale constraint, SSDA could well predict the accurate receptive field of a query to support robust query feature generation. What is more, SSDA is assigned with a Weighted Scale Matching (WSM) loss to supervise scale prediction, which presents more confident results as compared to the unsupervised attention mechanisms. Extensive experiments on “KITTI” demonstrate that SSDA significantly improves the detection accuracy especially on moderate and hard objects, yielding SOTA performance as compared to the existing approaches. Code will be publicly available at https://github.com/mikasa3lili/SSD-MonoDETR.

arxiv情報

著者	Xuan He,Fan Yang,Jiacheng Lin,Haolong Fu,Jin Yuan,Kailun Yang,Zhiyong Li
発行日	2023-05-12 06:17:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

SSD-MonoDTR: Supervised Scale-constrained Deformable Transformer for Monocular 3D Object Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー