DiRecNetV2: A Transformer-Enhanced Network for Aerial Disaster Recognition

要約

災害評価における航空画像処理のための無人航空機 (UAV) と人工知能 (AI) モデルの統合には、優れた精度、計算効率、およびリアルタイム処理能力を実証するモデルが必要です。
従来の畳み込みニューラルネットワーク (CNN) は、局所的な特徴抽出の効率を実証していますが、グローバルなコンテキスト解釈の可能性によって制限されています。
一方、ビジョントランスフォーマー (ViT) は、UAV ベースの災害対応アプリケーションではまだ十分に研究されていませんが、アテンションメカニズムの使用を通じてグローバルコンテキストの解釈を向上させる可能性を示しています。
この研究ギャップを埋めるために、畳み込み層と変換層を利用する改良されたハイブリッドモデルである DiRecNetV2 を紹介します。
これは、堅牢な特徴抽出のための CNN の誘導バイアスと、トランスフォーマーのグローバルコンテキスト理解とを結合し、UAV アプリケーションに理想的な低い計算負荷を維持します。
さらに、将来の研究の最初のベンチマークを設定するために、新しいコンパクトな災害のマルチラベルデータセットを導入し、単一ラベルデータでトレーニングされたモデルがマルチラベルテストセットでどのように機能するかを調査します。
この研究では、効率の 1 秒あたりのフレーム数 (FPS) と分類パフォーマンスの加重 F1 スコアに基づいて、AIDERSv2 データセット上の軽量 CNN と ViT を評価します。
DiRecNetV2 は、単一ラベルテストセットで 0.964 の加重 F1 スコアを達成するだけでなく、Nvidia Orin Jetson デバイス上で 176.13 FPS で機能しながら、複雑なマルチラベルテストセットで 0.614 のスコアという適応性も示しています。

要約(オリジナル)

The integration of Unmanned Aerial Vehicles (UAVs) with artificial intelligence (AI) models for aerial imagery processing in disaster assessment, necessitates models that demonstrate exceptional accuracy, computational efficiency, and real-time processing capabilities. Traditionally Convolutional Neural Networks (CNNs), demonstrate efficiency in local feature extraction but are limited by their potential for global context interpretation. On the other hand, Vision Transformers (ViTs) show promise for improved global context interpretation through the use of attention mechanisms, although they still remain underinvestigated in UAV-based disaster response applications. Bridging this research gap, we introduce DiRecNetV2, an improved hybrid model that utilizes convolutional and transformer layers. It merges the inductive biases of CNNs for robust feature extraction with the global context understanding of Transformers, maintaining a low computational load ideal for UAV applications. Additionally, we introduce a new, compact multi-label dataset of disasters, to set an initial benchmark for future research, exploring how models trained on single-label data perform in a multi-label test set. The study assesses lightweight CNNs and ViTs on the AIDERSv2 dataset, based on the frames per second (FPS) for efficiency and the weighted F1 scores for classification performance. DiRecNetV2 not only achieves a weighted F1 score of 0.964 on a single-label test set but also demonstrates adaptability, with a score of 0.614 on a complex multi-label test set, while functioning at 176.13 FPS on the Nvidia Orin Jetson device.

arxiv情報

著者	Demetris Shianios,Panayiotis Kolios,Christos Kyrkou
発行日	2024-10-17 15:25:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DiRecNetV2: A Transformer-Enhanced Network for Aerial Disaster Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー