DEIM: DETR with Improved Matching for Fast Convergence

要約

DEIM は、Transformer ベースのアーキテクチャ (DETR) によるリアルタイムの物体検出の収束を加速するように設計された革新的で効率的なトレーニングフレームワークです。
DETR モデルの 1 対 1 (O2O) マッチングに固有の疎な監視を軽減するために、DEIM は高密度 O2O マッチング戦略を採用しています。
このアプローチでは、標準のデータ拡張技術を使用して追加のターゲットを組み込むことにより、画像あたりの陽性サンプルの数を増やします。
高密度 O2O マッチングは収束を高速化しますが、パフォーマンスに影響を与える可能性のある低品質のマッチングも多数発生します。
これに対処するために、さまざまな品質レベルにわたってマッチングを最適化し、Dense O2O の有効性を高める新しい損失関数である Matchability-Aware Loss (MAL) を提案します。
COCO データセットに関する広範な実験により、DEIM の有効性が検証されています。
RT-DETR および D-FINE と統合すると、トレーニング時間を 50% 削減しながらパフォーマンスを一貫して向上させます。
特に、RT-DETRv2 と組み合わせると、DEIM は NVIDIA 4090 GPU での 1 日のトレーニングで 53.2% の AP を達成します。
さらに、DEIM でトレーニングされたリアルタイムモデルは、NVIDIA T4 GPU 上で 124 および 78 FPS で 54.7% と 56.5% の AP を達成する DEIM-D-FINE-L および DEIM-D-FINE-X により、主要なリアルタイムオブジェクト検出器を上回るパフォーマンスを発揮します。
、それぞれ、追加のデータは必要ありません。
私たちは、DEIM がリアルタイムの物体検出の進歩に新たな基準を設定すると信じています。
コードと事前トレーニングされたモデルは https://github.com/ShihuaHuang95/DEIM で入手できます。

要約(オリジナル)

We introduce DEIM, an innovative and efficient training framework designed to accelerate convergence in real-time object detection with Transformer-based architectures (DETR). To mitigate the sparse supervision inherent in one-to-one (O2O) matching in DETR models, DEIM employs a Dense O2O matching strategy. This approach increases the number of positive samples per image by incorporating additional targets, using standard data augmentation techniques. While Dense O2O matching speeds up convergence, it also introduces numerous low-quality matches that could affect performance. To address this, we propose the Matchability-Aware Loss (MAL), a novel loss function that optimizes matches across various quality levels, enhancing the effectiveness of Dense O2O. Extensive experiments on the COCO dataset validate the efficacy of DEIM. When integrated with RT-DETR and D-FINE, it consistently boosts performance while reducing training time by 50%. Notably, paired with RT-DETRv2, DEIM achieves 53.2% AP in a single day of training on an NVIDIA 4090 GPU. Additionally, DEIM-trained real-time models outperform leading real-time object detectors, with DEIM-D-FINE-L and DEIM-D-FINE-X achieving 54.7% and 56.5% AP at 124 and 78 FPS on an NVIDIA T4 GPU, respectively, without the need for additional data. We believe DEIM sets a new baseline for advancements in real-time object detection. Our code and pre-trained models are available at https://github.com/ShihuaHuang95/DEIM.

arxiv情報

著者	Shihua Huang,Zhichao Lu,Xiaodong Cun,Yongjun Yu,Xiao Zhou,Xi Shen
発行日	2024-12-05 15:10:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DEIM: DETR with Improved Matching for Fast Convergence

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー