Transferable Adversarial Attacks on Black-Box Vision-Language Models

要約

Vision Large Language Models（VLLM）は、テキストと画像の両方からなる入力に対して高度な機能を提供するために、ますます導入が進んでいる。先行研究では、敵対的な攻撃は、テキストのみや視覚のみのコンテキストにおいて、オープンソースからプロプライエタリなブラックボックスモデルに移行できることが示されているが、そのような脆弱性の範囲と有効性は、VLLMについては未調査のままである。我々は、標的を絞った敵対的な例が、GPT-4o、Claude、Geminiのような広く使用されているプロプライエタリなVLLMに高度に移植可能であることを示す包括的な分析を提示する。攻撃者は、危険なコンテンツを安全であると誤認したり、機密または制限された素材を見落としたり、攻撃者の意図に沿った詳細で不正確な応答を生成するなど、攻撃者が選択した特定の視覚情報の解釈を誘導するために摂動を細工できることを示す。さらに、普遍的な摂動（幅広い画像群に適用可能な修正）が、複数の独自のVLLMにおいて一貫してこれらの誤認識を誘導できることを発見した。物体認識、視覚的質問応答、画像キャプションに関する我々の実験結果は、この脆弱性が現在の最先端モデルに共通していることを示し、VLLMの安全でセキュアな展開を保証するための強固な緩和策の緊急の必要性を強調している。

要約(オリジナル)

Vision Large Language Models (VLLMs) are increasingly deployed to offer advanced capabilities on inputs comprising both text and images. While prior research has shown that adversarial attacks can transfer from open-source to proprietary black-box models in text-only and vision-only contexts, the extent and effectiveness of such vulnerabilities remain underexplored for VLLMs. We present a comprehensive analysis demonstrating that targeted adversarial examples are highly transferable to widely-used proprietary VLLMs such as GPT-4o, Claude, and Gemini. We show that attackers can craft perturbations to induce specific attacker-chosen interpretations of visual information, such as misinterpreting hazardous content as safe, overlooking sensitive or restricted material, or generating detailed incorrect responses aligned with the attacker’s intent. Furthermore, we discover that universal perturbations — modifications applicable to a wide set of images — can consistently induce these misinterpretations across multiple proprietary VLLMs. Our experimental results on object recognition, visual question answering, and image captioning show that this vulnerability is common across current state-of-the-art models, and underscore an urgent need for robust mitigations to ensure the safe and secure deployment of VLLMs.

arxiv情報

著者	Kai Hu,Weichen Yu,Li Zhang,Alexander Robey,Andy Zou,Chengming Xu,Haoqi Hu,Matt Fredrikson
発行日	2025-05-02 06:51:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Transferable Adversarial Attacks on Black-Box Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー