Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

要約

マルチモーダル大手言語モデル（MLLM）は、譲渡可能な敵の例に対して脆弱なままです。
既存の方法は通常、グローバルな機能をClipの[CLS]トークンとターゲットサンプルの間に調整することにより、ターゲット攻撃を実現しますが、パッチトークンでエンコードされた豊富なローカル情報を見落としていることがよくあります。
これにより、特にクローズドソースモデルでは、最適ではないアライメントと限定的な転送可能性につながります。
この制限に対処するために、FOA-attackと呼ばれる機能の最適なアライメントに基づいて、敵対的な転送能力を改善するために、ターゲットを絞った転送可能な敵対的攻撃方法を提案します。
具体的には、グローバルレベルでは、コサインの類似性に基づいたグローバルな特徴損失を導入し、敵対サンプルの粗粒の特徴を標的サンプルの特徴と整列させます。
ローカルレベルでは、変圧器内の豊富なローカル表現を考えると、クラスタリング技術を活用して、コンパクトなローカルパターンを抽出して、冗長なローカル機能を軽減します。
次に、最適なトランスポート（OT）問題として敵対サンプルとターゲットサンプル間のローカルフィーチャのアライメントを策定し、地域のクラスタリング最適な輸送損失を提案して、細粒の特徴アライメントを改良します。
さらに、敵対的な例の生成中に複数のモデルの影響を適応的にバランスさせるために、動的なアンサンブルモデルの重み付け戦略を提案し、それにより移転性をさらに向上させます。
さまざまなモデルにわたる広範な実験は、提案された方法の優位性を示しており、特にクローズドソースMLLMに移行する際に最先端の方法を上回ります。
このコードは、https：//github.com/jiaxiaojunqaq/foa-attackでリリースされています。

要約(オリジナル)

Multimodal large language models (MLLMs) remain vulnerable to transferable adversarial examples. While existing methods typically achieve targeted attacks by aligning global features-such as CLIP’s [CLS] token-between adversarial and target samples, they often overlook the rich local information encoded in patch tokens. This leads to suboptimal alignment and limited transferability, particularly for closed-source models. To address this limitation, we propose a targeted transferable adversarial attack method based on feature optimal alignment, called FOA-Attack, to improve adversarial transfer capability. Specifically, at the global level, we introduce a global feature loss based on cosine similarity to align the coarse-grained features of adversarial samples with those of target samples. At the local level, given the rich local representations within Transformers, we leverage clustering techniques to extract compact local patterns to alleviate redundant local features. We then formulate local feature alignment between adversarial and target samples as an optimal transport (OT) problem and propose a local clustering optimal transport loss to refine fine-grained feature alignment. Additionally, we propose a dynamic ensemble model weighting strategy to adaptively balance the influence of multiple models during adversarial example generation, thereby further improving transferability. Extensive experiments across various models demonstrate the superiority of the proposed method, outperforming state-of-the-art methods, especially in transferring to closed-source MLLMs. The code is released at https://github.com/jiaxiaojunQAQ/FOA-Attack.

arxiv情報

著者	Xiaojun Jia,Sensen Gao,Simeng Qin,Tianyu Pang,Chao Du,Yihao Huang,Xinfeng Li,Yiming Li,Bo Li,Yang Liu
発行日	2025-05-27 17:56:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー