Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation

要約

マルチモーダル大規模言語モデル (MLLM) は優れた推論能力を示していますが、以前の LLM よりも脱獄攻撃に対して脆弱でもあります。
安全でない応答を検出することはまだ可能ですが、MLLM 内の事前に調整された LLM の安全メカニズムは、画像特徴の導入により簡単にバイパスされる可能性があることが観察されています。
堅牢なMLLMを構築するために、我々はECSO（Eyes Closed, Safety On）を提案します。これは、MLLMの本質的な安全意識を活用し、危険な画像をテキストに適応的に変換して事前の本質安全メカニズムを活性化することにより、より安全な応答を生成する、トレーニング不要の新しい保護アプローチです。
MLLM 内の -aligned LLM。
5 つの最先端 (SoTA) MLLM での実験では、ECSO がモデルの安全性を大幅に向上させることが実証されています (例: MM-SafetyBench (SD+OCR) では 37.6%、LLaVA-1.5 では VLSafe では 71.3% 向上)
-7B)、一般的な MLLM ベンチマークでユーティリティの結果を一貫して維持します。
さらに、ECSO をデータエンジンとして使用して、余分な人間の介入なしで MLLM アライメント用の教師あり微調整 (SFT) データを生成できることを示します。

要約(オリジナル)

Multimodal large language models (MLLMs) have shown impressive reasoning abilities, which, however, are also more vulnerable to jailbreak attacks than their LLM predecessors. Although still capable of detecting unsafe responses, we observe that safety mechanisms of the pre-aligned LLMs in MLLMs can be easily bypassed due to the introduction of image features. To construct robust MLLMs, we propose ECSO(Eyes Closed, Safety On), a novel training-free protecting approach that exploits the inherent safety awareness of MLLMs, and generates safer responses via adaptively transforming unsafe images into texts to activate intrinsic safety mechanism of pre-aligned LLMs in MLLMs. Experiments on five state-of-the-art (SoTA) MLLMs demonstrate that our ECSO enhances model safety significantly (e.g., a 37.6% improvement on the MM-SafetyBench (SD+OCR), and 71.3% on VLSafe for the LLaVA-1.5-7B), while consistently maintaining utility results on common MLLM benchmarks. Furthermore, we show that ECSO can be used as a data engine to generate supervised-finetuning (SFT) data for MLLM alignment without extra human intervention.

arxiv情報

著者	Yunhao Gou,Kai Chen,Zhili Liu,Lanqing Hong,Hang Xu,Zhenguo Li,Dit-Yan Yeung,James T. Kwok,Yu Zhang
発行日	2024-03-14 17:03:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー