Exploring Typographic Visual Prompts Injection Threats in Cross-Modality Generation Models

要約

現在のクロスモダリティ生成モデル（GMS）は、さまざまな生成タスクで顕著な能力を示しています。
現実世界のシナリオ、クロスビジョン、ビジョン言語知覚（VLP）を含むクロスビジョン（I2I）（I2I）における視覚モダリティ入力の遍在性と情報の豊かさを考えると、タスクは大きな注目を集めています。
大規模なビジョン言語モデル（LVLMS）とI2I GMは、それぞれVLPおよびI2Iタスクを処理するために採用されています。
以前の研究では、入力画像にタイポグラフィの単語を印刷すると、LVLMSとI2I GMが大幅に誘導して、それらの単語に意味的に関連する破壊的な出力を生成することが示されています。
さらに、より洗練された形式のタイポグラフィとしての視覚プロンプトは、画像に注入されたときにVLPタスクのさまざまなアプリケーションにセキュリティリスクをもたらすことも明らかにされています。
このホワイトペーパーでは、さまざまなLVLMSおよびI2I GMのタイポグラフィ視覚迅速な促進（TVPI）によって引き起こされるパフォーマンスへの影響を包括的に調査します。
この脅威のパフォーマンスの変更と特性をよりよく観察するために、TVPIデータセットも紹介します。
広範な探求を通じて、私たちはさまざまなGMにおけるTVPIの脅威の根本的な原因の理解を深め、その潜在的な起源に関する貴重な洞察を提供します。

要約(オリジナル)

Current Cross-Modality Generation Models (GMs) demonstrate remarkable capabilities in various generative tasks. Given the ubiquity and information richness of vision modality inputs in real-world scenarios, Cross-vision, encompassing Vision-Language Perception (VLP) and Image-to-Image (I2I), tasks have attracted significant attention. Large Vision Language Models (LVLMs) and I2I GMs are employed to handle VLP and I2I tasks, respectively. Previous research indicates that printing typographic words into input images significantly induces LVLMs and I2I GMs to generate disruptive outputs semantically related to those words. Additionally, visual prompts, as a more sophisticated form of typography, are also revealed to pose security risks to various applications of VLP tasks when injected into images. In this paper, we comprehensively investigate the performance impact induced by Typographic Visual Prompt Injection (TVPI) in various LVLMs and I2I GMs. To better observe performance modifications and characteristics of this threat, we also introduce the TVPI Dataset. Through extensive explorations, we deepen the understanding of the underlying causes of the TVPI threat in various GMs and offer valuable insights into its potential origins.

arxiv情報

著者	Hao Cheng,Erjia Xiao,Yichi Wang,Kaidi Xu,Mengshu Sun,Jindong Gu,Renjing Xu
発行日	2025-03-14 15:42:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Exploring Typographic Visual Prompts Injection Threats in Cross-Modality Generation Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー