Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks

要約

最近、Large Vision-Language Model (LVLM) に関して大きな進歩が見られました。
大規模な事前トレーニング済み言語モデルを利用する新しいクラスの VL モデル。
しかし、誤解を招くテキストを画像に重ね合わせるタイポグラフィー攻撃に対する脆弱性はまだ研究されていません。
さらに、従来のタイポグラフィ攻撃は、事前に定義されたクラスのセットから誤解を招くクラスをランダムにサンプリングすることに依存していました。
ただし、ランダムに選択されたクラスが最も効果的な攻撃であるとは限りません。
これらの問題に対処するために、まず、活字攻撃に対する LVLM の脆弱性をテストするために独自に設計された新しいベンチマークを導入します。
さらに、新しくより効果的なタイポグラフィ攻撃である自己生成タイポグラフィ攻撃を導入します。
実際、私たちの方法は、画像が与えられると、単にタイポグラフィ攻撃を推奨するよう促すだけで、GPT-4V のようなモデルの強力な言語機能を利用します。
新しいベンチマークを使用すると、活字攻撃が LVLM に対する重大な脅威であることが判明しました。
さらに、新しい手法を使用した GPT-4V が推奨するタイポグラフィ攻撃は、以前の攻撃と比較して GPT-4V 自体に対して効果的であるだけでなく、LLaVA、InstructBLIP、
そしてMiniGPT4。

要約(オリジナル)

Recently, significant progress has been made on Large Vision-Language Models (LVLMs); a new class of VL models that make use of large pre-trained language models. Yet, their vulnerability to Typographic attacks, which involve superimposing misleading text onto an image remain unstudied. Furthermore, prior work typographic attacks rely on sampling a random misleading class from a predefined set of classes. However, the random chosen class might not be the most effective attack. To address these issues, we first introduce a novel benchmark uniquely designed to test LVLMs vulnerability to typographic attacks. Furthermore, we introduce a new and more effective typographic attack: Self-Generated typographic attacks. Indeed, our method, given an image, make use of the strong language capabilities of models like GPT-4V by simply prompting them to recommend a typographic attack. Using our novel benchmark, we uncover that typographic attacks represent a significant threat against LVLM(s). Furthermore, we uncover that typographic attacks recommended by GPT-4V using our new method are not only more effective against GPT-4V itself compared to prior work attacks, but also against a host of less capable yet popular open source models like LLaVA, InstructBLIP, and MiniGPT4.

arxiv情報

著者	Maan Qraitem,Nazia Tasnim,Kate Saenko,Bryan A. Plummer
発行日	2024-02-01 14:41:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー