Instance-level Trojan Attacks on Visual Question Answering via Adversarial Learning in Neuron Activation Space

要約

タイトル：ニューロン活性化空間における敵対的学習を用いた視覚的質問応答へのインスタンスレベルのトロイの攻撃

要約：
– トロイの攻撃は、入力データに埋め込まれた悪意のある歪みであり、ニューラルネットワークが誤動作を引き起こす可能性がある。
– しかし、モデルのファインチューニング中には、VQAのような事前学習された大規模なモデルから知識を転送することが含まれ、トロイの攻撃の影響が減らされます。
– トロイの攻撃の影響を緩和するために、事前学習されたモデルの複数のレイヤーを置換してファインチューニングすることが可能です。
– この研究は、サンプル効率、ステルス性と変動性、モデルのファインチューニングに対する強さに焦点を当てています。
– これらの課題に対処するために、入力サンプルとモダリティ全体にわたって多様なトロイの攻撃を生成するインスタンスレベルのトロイ攻撃を提案しています。
– 敵対的学習により、指定された歪み層とファインチューニングされたモデルの誤動作との相関関係を確立します。
– VQA-v2データセットで、様々なメトリックを使用して詳細な実験を実施しました。
– 結果は、提案された方法が最小限のサンプルでファインチューニングモデルに効果的に適応できることを示しています。
– 具体的には、単一のファインチューニングレイヤーを持つモデルは、敵対的なサンプルの一撃で妨害されることがわかりました。ファインチューニングレイヤーがより多いモデルでも、わずか数撃で妨害されることがわかりました。

要約(オリジナル)

Malicious perturbations embedded in input data, known as Trojan attacks, can cause neural networks to misbehave. However, the impact of a Trojan attack is reduced during fine-tuning of the model, which involves transferring knowledge from a pretrained large-scale model like visual question answering (VQA) to the target model. To mitigate the effects of a Trojan attack, replacing and fine-tuning multiple layers of the pretrained model is possible. This research focuses on sample efficiency, stealthiness and variation, and robustness to model fine-tuning. To address these challenges, we propose an instance-level Trojan attack that generates diverse Trojans across input samples and modalities. Adversarial learning establishes a correlation between a specified perturbation layer and the misbehavior of the fine-tuned model. We conducted extensive experiments on the VQA-v2 dataset using a range of metrics. The results show that our proposed method can effectively adapt to a fine-tuned model with minimal samples. Specifically, we found that a model with a single fine-tuning layer can be compromised using a single shot of adversarial samples, while a model with more fine-tuning layers can be compromised using only a few shots.

arxiv情報

著者	Yuwei Sun,Hideya Ochiai,Jun Sakuma
発行日	2023-04-02 03:03:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Instance-level Trojan Attacks on Visual Question Answering via Adversarial Learning in Neuron Activation Space

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー