HRVQA: A Visual Question Answering Benchmark for High-Resolution Aerial Images

要約

視覚的質問応答 (VQA) は、コンピュータービジョンにおける重要かつ挑戦的なマルチモーダルタスクです。
最近、VQA タスクを航空画像に適用する試みがいくつか行われました。これは、災害監視、都市計画、およびデジタルアースプロダクト生成における現実世界での応用の可能性があるためです。
ただし、航空画像の概念の外観、スケール、方向が大きく異なるだけでなく、十分に注釈が付けられたデータセットが不足しているため、この分野での VQA の開発が制限されています。
このホワイトペーパーでは、収集された 1024*1024 ピクセルの 53512 個の航空画像と半自動生成された 1070240 個の QA ペアを提供する新しいデータセット HRVQA を紹介します。
航空画像の VQA モデルの理解能力をベンチマークするために、HRVQA で関連する方法を評価します。
さらに、ゲーテッドアテンションモジュールと相互融合モジュールを備えた新しいモデルGFTransformerを提案します。
実験は、提案されたデータセットが非常に難しいことを示しており、特に特定の属性に関連する質問です。
私たちの方法は、以前の最先端のアプローチと比較して優れたパフォーマンスを実現します。
データセットとソースコードは https://hrvqa.nl/ で公開されます。

要約(オリジナル)

Visual question answering (VQA) is an important and challenging multimodal task in computer vision. Recently, a few efforts have been made to bring VQA task to aerial images, due to its potential real-world applications in disaster monitoring, urban planning, and digital earth product generation. However, not only the huge variation in the appearance, scale and orientation of the concepts in aerial images, but also the scarcity of the well-annotated datasets restricts the development of VQA in this domain. In this paper, we introduce a new dataset, HRVQA, which provides collected 53512 aerial images of 1024*1024 pixels and semi-automatically generated 1070240 QA pairs. To benchmark the understanding capability of VQA models for aerial images, we evaluate the relevant methods on HRVQA. Moreover, we propose a novel model, GFTransformer, with gated attention modules and a mutual fusion module. The experiments show that the proposed dataset is quite challenging, especially the specific attribute related questions. Our method achieves superior performance in comparison to the previous state-of-the-art approaches. The dataset and the source code will be released at https://hrvqa.nl/.

arxiv情報

著者	Kun Li,George Vosselman,Michael Ying Yang
発行日	2023-01-23 14:36:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

HRVQA: A Visual Question Answering Benchmark for High-Resolution Aerial Images

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー