CrisisViT: A Robust Vision Transformer for Crisis Image Classification

要約

緊急時には、危機対応機関は、関連するサービスやリソースを展開するために、現場の状況を迅速かつ正確に評価する必要があります。
しかし、現地の対応サービスが直接の報告を提供できるまでは影響を受けた地域に関するデータが不足する可能性があるため、当局は限られた情報に基づいて決定を下さなければならないことがよくあります。
幸いなことに、高品質のカメラを備えたスマートフォンが広く普及したことにより、ソーシャルメディアを通じた市民ジャーナリズムが危機対応者にとって貴重な情報源となっています。
しかし、市民が投稿した大量の画像を分析するには、通常よりも多くの時間と労力が必要です。
この問題に対処するために、この論文では、特に危機画像分類 (CrisisViT) にトランスフォーマーベースのアーキテクチャを適応させることにより、自動画像分類/タグ付けのための最先端のディープニューラルモデルの使用を提案します。
新しい Incidents1M 危機画像データセットを活用して、一連の新しいトランスフォーマーベースの画像分類モデルを開発します。
標準的な危機画像ベンチマークデータセットの実験を通じて、CrisisViT モデルが緊急事態の種類、画像の関連性、人道的カテゴリー、被害の深刻度分類において以前のアプローチよりも大幅に優れていることを実証しました。
さらに、新しい Incidents1M データセットが CrisisViT モデルをさらに強化し、絶対精度がさらに 1.25% 向上することを示します。

要約(オリジナル)

In times of emergency, crisis response agencies need to quickly and accurately assess the situation on the ground in order to deploy relevant services and resources. However, authorities often have to make decisions based on limited information, as data on affected regions can be scarce until local response services can provide first-hand reports. Fortunately, the widespread availability of smartphones with high-quality cameras has made citizen journalism through social media a valuable source of information for crisis responders. However, analyzing the large volume of images posted by citizens requires more time and effort than is typically available. To address this issue, this paper proposes the use of state-of-the-art deep neural models for automatic image classification/tagging, specifically by adapting transformer-based architectures for crisis image classification (CrisisViT). We leverage the new Incidents1M crisis image dataset to develop a range of new transformer-based image classification models. Through experimentation over the standard Crisis image benchmark dataset, we demonstrate that the CrisisViT models significantly outperform previous approaches in emergency type, image relevance, humanitarian category, and damage severity classification. Additionally, we show that the new Incidents1M dataset can further augment the CrisisViT models resulting in an additional 1.25% absolute accuracy gain.

arxiv情報

著者	Zijun Long,Richard McCreadie,Muhammad Imran
発行日	2024-01-05 14:45:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CrisisViT: A Robust Vision Transformer for Crisis Image Classification

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー