Improving Visual Question Answering Models through Robustness Analysis and In-Context Learning with a Chain of Basic Questions

要約

タイトル：基本的な質問の連鎖によるロバスト性分析とインコンテキスト学習によってVisual Question Answeringモデルを改善する

要約：

– VQA（Visual Question Answering）の課題において、深層ニューラルネットワークは重要な役割を果たしており、これまでの研究は主にモデルの精度向上に焦点を当ててきた。
– しかし、最近では、これらのモデルの逆襲攻撃に対する耐久性を評価する傾向がある。これは、入力の騒音レベルを段階的に高めてVQAモデルの正確性を評価するもので、画像または提案された問い合わせ質問であるメイン質問を標的とすることができる。
– しかし、VQAのこの側面については、現在正しい分析が不足している。この研究では、ノイズとして作用する意味的に関連する質問である基本的な質問を使用してVQAモデルの耐久性を評価する新しい方法を提案しています。
– 基本的な質問をメイン質問に対しての類似度に基づいてランク付けし、LASSO最適化問題としてキャストすることによって、与えられたメイン質問に対して合理的なノイズレベルを生成するために、メイン質問に対して合理的なノイズレベルを生成します。
– さらに、この研究では新しいロバスト性の測定値であるR_scoreと2つの基本的な質問データセットを提案しており、VQAモデルのロバスト性の分析を標準化することができます。
– 実験の結果、提案された評価方法はVQAモデルの耐久性を効果的に分析することができることが示されました。さらに、実験は、基本的な質問の連鎖とインコンテキスト学習によってモデルの精度を向上させることができることを示しています。

要約(オリジナル)

Deep neural networks have been critical in the task of Visual Question Answering (VQA), with research traditionally focused on improving model accuracy. Recently, however, there has been a trend towards evaluating the robustness of these models against adversarial attacks. This involves assessing the accuracy of VQA models under increasing levels of noise in the input, which can target either the image or the proposed query question, dubbed the main question. However, there is currently a lack of proper analysis of this aspect of VQA. This work proposes a new method that utilizes semantically related questions, referred to as basic questions, acting as noise to evaluate the robustness of VQA models. It is hypothesized that as the similarity of a basic question to the main question decreases, the level of noise increases. To generate a reasonable noise level for a given main question, a pool of basic questions is ranked based on their similarity to the main question, and this ranking problem is cast as a LASSO optimization problem. Additionally, this work proposes a novel robustness measure, R_score, and two basic question datasets to standardize the analysis of VQA model robustness. The experimental results demonstrate that the proposed evaluation method effectively analyzes the robustness of VQA models. Moreover, the experiments show that in-context learning with a chain of basic questions can enhance model accuracy.

arxiv情報

著者	Jia-Hong Huang,Modar Alfadly,Bernard Ghanem,Marcel Worring
発行日	2023-04-06 15:32:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Improving Visual Question Answering Models through Robustness Analysis and In-Context Learning with a Chain of Basic Questions

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー