I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models

要約

最新の画像からテキストへの変換システムは通常、エンコーダ/デコーダフレームワークを採用しています。このフレームワークは、画像特徴の抽出を担当する画像エンコーダと、キャプションの生成に使用されるトランスフォーマベースのデコーダという 2 つの主要コンポーネントで構成されます。
敵対的な摂動に対するニューラルネットワークの堅牢性の分析からインスピレーションを得て、画像からテキストへのモデルで敵対的な例を作成するための新しいグレーボックスアルゴリズムを提案します。
有限セットのクラスラベルを持つ画像分類タスクとは異なり、キャプションシステムではキャプションの可能性が事実上無限にあるため、画像からテキストへのタスクで視覚的に類似した敵対的な例を見つけることは、より大きな課題となります。
この論文では、画像からテキストへの、非ターゲットとターゲットの両方に対するグレーボックスの敵対的攻撃を紹介します。
私たちは、敵対的な摂動を発見するプロセスを、画像エンコーダーコンポーネントのみを使用する最適化問題として定式化します。これは、提案された攻撃が言語モデルに依存しないことを意味します。
Hugging Face で最もよく使用される画像からテキストへのモデルである ViT-GPT2 モデルと Flickr30k データセットで行われた実験を通じて、提案された攻撃がターゲットを絞っていないキャプションとターゲットを絞ったキャプションの両方で、視覚的に類似した敵対的な例を生成することに成功したことを実証しました。
。
特に、私たちの攻撃はグレーボックス方式で動作し、デコーダモジュールに関する知識を必要としません。
また、私たちの攻撃が人気のオープンソースプラットフォームである Hugging Face を騙していることも示します。

要約(オリジナル)

Modern image-to-text systems typically adopt the encoder-decoder framework, which comprises two main components: an image encoder, responsible for extracting image features, and a transformer-based decoder, used for generating captions. Taking inspiration from the analysis of neural networks’ robustness against adversarial perturbations, we propose a novel gray-box algorithm for creating adversarial examples in image-to-text models. Unlike image classification tasks that have a finite set of class labels, finding visually similar adversarial examples in an image-to-text task poses greater challenges because the captioning system allows for a virtually infinite space of possible captions. In this paper, we present a gray-box adversarial attack on image-to-text, both untargeted and targeted. We formulate the process of discovering adversarial perturbations as an optimization problem that uses only the image-encoder component, meaning the proposed attack is language-model agnostic. Through experiments conducted on the ViT-GPT2 model, which is the most-used image-to-text model in Hugging Face, and the Flickr30k dataset, we demonstrate that our proposed attack successfully generates visually similar adversarial examples, both with untargeted and targeted captions. Notably, our attack operates in a gray-box manner, requiring no knowledge about the decoder module. We also show that our attacks fool the popular open-source platform Hugging Face.

arxiv情報

著者	Raz Lapid,Moshe Sipper
発行日	2023-07-19 12:04:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー