AIGCs Confuse AI Too: Investigating and Explaining Synthetic Image-induced Hallucinations in Large Vision-Language Models

要約

人工知能が生成するコンテンツ（AIGC）の進化は、より質の高いものへと進んでいる。AIGCとの相互作用の増大は、データ駆動型AIのコミュニティに新たな課題を提示している：AIが生成したコンテンツは、様々なAIモデルにおいて重要な役割を担っているが、それらがもたらす潜在的な隠れたリスクについては、十分に検証されていない。人間指向の偽造検出だけでなく、AIが生成したコンテンツは、もともと自然データを処理するように設計されたAIモデルに潜在的な問題を提起する。本研究では、AIが合成した画像によって大型視覚言語モデル（LVLM）の幻覚現象が悪化することを明らかにする。その結果、合成画像によって誘発される物体の幻覚は、自然画像と比較して非現実的または追加的な視覚的特徴を示さないにもかかわらず、より多くの量とより均一な位置分布によって特徴付けられるという、一貫したAIGC 〚幻覚バイアス〛を明らかにした。さらに、Q-formerとLinear projectorを用いた検討により、合成画像は視覚投影後にトークン偏差を呈し、それによって幻覚バイアスが増幅される可能性があることが明らかになった。

要約(オリジナル)

The evolution of Artificial Intelligence Generated Contents (AIGCs) is advancing towards higher quality. The growing interactions with AIGCs present a new challenge to the data-driven AI community: While AI-generated contents have played a crucial role in a wide range of AI models, the potential hidden risks they introduce have not been thoroughly examined. Beyond human-oriented forgery detection, AI-generated content poses potential issues for AI models originally designed to process natural data. In this study, we underscore the exacerbated hallucination phenomena in Large Vision-Language Models (LVLMs) caused by AI-synthetic images. Remarkably, our findings shed light on a consistent AIGC \textbf{hallucination bias}: the object hallucinations induced by synthetic images are characterized by a greater quantity and a more uniform position distribution, even these synthetic images do not manifest unrealistic or additional relevant visual features compared to natural images. Moreover, our investigations on Q-former and Linear projector reveal that synthetic images may present token deviations after visual projection, thereby amplifying the hallucination bias.

arxiv情報

著者	Yifei Gao,Jiaqi Wang,Zhiyu Lin,Jitao Sang
発行日	2024-09-03 01:53:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

AIGCs Confuse AI Too: Investigating and Explaining Synthetic Image-induced Hallucinations in Large Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー