TransVisDrone: Spatio-Temporal Transformer for Vision-based Drone-to-Drone Detection in Aerial Videos

要約

ビジュアルフィードを使用したドローン間の検出には、ドローンの衝突の検出、ドローンの攻撃の検出、他のドローンとの飛行の調整など、重要な用途があります。
ただし、既存の方法は計算コストが高く、非エンドツーエンドの最適化に従っており、複雑な多段階のパイプラインを備えているため、エッジデバイスでのリアルタイム展開にはあまり適していません。
この研究では、より高い計算効率を備えたエンドツーエンドのソリューションを提供する、シンプルかつ効果的なフレームワーク \textit{TransVisDrone} を提案します。
CSPDarkNet-53 ネットワークを利用して物体関連の空間特徴を学習し、VideoSwin モデルを利用して、ドローンの動きの時空間依存性を学習することで、困難なシナリオにおけるドローンの検出を向上させます。
私たちの手法は、NPS 0.95、FLDrones 0.75、AOT 0.80 という 3 つの困難な現実世界のデータセット (平均精度 @ 0.5IOU) で最先端のパフォーマンスを達成し、以前の手法よりも高いスループットを実現しました。
また、エッジデバイスへの展開機能とドローンの衝突（遭遇）検出における有用性も実証します。
プロジェクト: \url{https://tusharsangam.github.io/TransVisDrone-project-page/}。

要約(オリジナル)

Drone-to-drone detection using visual feed has crucial applications, such as detecting drone collisions, detecting drone attacks, or coordinating flight with other drones. However, existing methods are computationally costly, follow non-end-to-end optimization, and have complex multi-stage pipelines, making them less suitable for real-time deployment on edge devices. In this work, we propose a simple yet effective framework, \textit{TransVisDrone}, that provides an end-to-end solution with higher computational efficiency. We utilize CSPDarkNet-53 network to learn object-related spatial features and VideoSwin model to improve drone detection in challenging scenarios by learning spatio-temporal dependencies of drone motion. Our method achieves state-of-the-art performance on three challenging real-world datasets (Average Precision@0.5IOU): NPS 0.95, FLDrones 0.75, and AOT 0.80, and a higher throughput than previous methods. We also demonstrate its deployment capability on edge devices and its usefulness in detecting drone-collision (encounter). Project: \url{https://tusharsangam.github.io/TransVisDrone-project-page/}.

arxiv情報

著者	Tushar Sangam,Ishan Rajendrakumar Dave,Waqas Sultani,Mubarak Shah
発行日	2023-08-26 00:54:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

TransVisDrone: Spatio-Temporal Transformer for Vision-based Drone-to-Drone Detection in Aerial Videos

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー