Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey

要約

マルチモーダル基礎モデルの急速な進化により、テキスト、画像、オーディオ、ビデオなどの多様なモダリティにわたるクロスモーダルの理解と生成が大幅に進歩しました。
ただし、これらのモデルは依然としてジェイルブレイク攻撃の影響を受けやすく、組み込みの安全メカニズムをバイパスし、潜在的に有害なコンテンツの生成を誘発する可能性があります。
したがって、現実世界のシナリオ、特にセキュリティに敏感なアプリケーションでマルチモーダル生成モデルを安全に展開するには、ジェイルブレイク攻撃の手法と既存の防御メカニズムを理解することが不可欠です。
このトピックについての包括的な洞察を提供するために、この調査では、マルチモーダル生成モデルにおけるジェイルブレイクと防御についてレビューします。
まず、マルチモーダルジェイルブレイクの一般化されたライフサイクルを考慮して、入力、エンコーダー、ジェネレーター、出力の 4 つのレベルにわたる攻撃と対応する防御戦略を体系的に調査します。
この分析に基づいて、マルチモーダル生成モデルに固有の攻撃方法、防御メカニズム、評価フレームワークの詳細な分類を示します。
さらに、生成システム内の Any-to-Text、Any-to-Vision、Any-to-Any などのモダリティを含む、幅広い入出力構成をカバーしています。
最後に、現在の研究課題を強調し、将来の研究の潜在的な方向性を提案します。
この作業に対応するオープンソースリポジトリは、https://github.com/liuxuannan/Awesome-Multimodal-Jailbreak にあります。

要約(オリジナル)

The rapid evolution of multimodal foundation models has led to significant advancements in cross-modal understanding and generation across diverse modalities, including text, images, audio, and video. However, these models remain susceptible to jailbreak attacks, which can bypass built-in safety mechanisms and induce the production of potentially harmful content. Consequently, understanding the methods of jailbreak attacks and existing defense mechanisms is essential to ensure the safe deployment of multimodal generative models in real-world scenarios, particularly in security-sensitive applications. To provide comprehensive insight into this topic, this survey reviews jailbreak and defense in multimodal generative models. First, given the generalized lifecycle of multimodal jailbreak, we systematically explore attacks and corresponding defense strategies across four levels: input, encoder, generator, and output. Based on this analysis, we present a detailed taxonomy of attack methods, defense mechanisms, and evaluation frameworks specific to multimodal generative models. Additionally, we cover a wide range of input-output configurations, including modalities such as Any-to-Text, Any-to-Vision, and Any-to-Any within generative systems. Finally, we highlight current research challenges and propose potential directions for future research. The open-source repository corresponding to this work can be found at https://github.com/liuxuannan/Awesome-Multimodal-Jailbreak.

arxiv情報

著者	Xuannan Liu,Xing Cui,Peipei Li,Zekun Li,Huaibo Huang,Shuhan Xia,Miaoxuan Zhang,Yueying Zou,Ran He
発行日	2024-12-09 14:22:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー