AIGCs Confuse AI Too: Investigating and Explaining Synthetic Image-induced Hallucinations in Large Vision-Language Models

要約

人工知能生成コンテンツ (AIGC) の進化は、より高品質に向かって進んでいます。
AIGC との相互作用の増大は、データ駆動型 AI コミュニティに新たな課題を突きつけています。AI が生成したコンテンツは幅広い AI モデルで重要な役割を果たしてきましたが、それらがもたらす潜在的な隠れたリスクは十分に調査されていません。
人間による偽造検出を超えて、AI によって生成されたコンテンツは、もともと自然データを処理するために設計された AI モデルに潜在的な問題を引き起こします。
この研究では、AI 合成画像によって引き起こされる大規模視覚言語モデル (LVLM) における幻覚現象の悪化を強調します。
注目すべきことに、私たちの発見は一貫した AIGC \textbf{幻覚バイアス}に光を当てています。つまり、合成画像によって引き起こされる物体の幻覚は、より大量でより均一な位置分布によって特徴付けられます。これらの合成画像でさえ、非現実的または追加の関連する視覚的特徴は現れません。
自然な画像と比較してください。
さらに、Q フォーマーとリニアプロジェクターに関する調査では、合成画像が視覚投影後にトークンの偏差を示し、それによって幻覚バイアスが増幅される可能性があることが明らかになりました。

要約(オリジナル)

The evolution of Artificial Intelligence Generated Contents (AIGCs) is advancing towards higher quality. The growing interactions with AIGCs present a new challenge to the data-driven AI community: While AI-generated contents have played a crucial role in a wide range of AI models, the potential hidden risks they introduce have not been thoroughly examined. Beyond human-oriented forgery detection, AI-generated content poses potential issues for AI models originally designed to process natural data. In this study, we underscore the exacerbated hallucination phenomena in Large Vision-Language Models (LVLMs) caused by AI-synthetic images. Remarkably, our findings shed light on a consistent AIGC \textbf{hallucination bias}: the object hallucinations induced by synthetic images are characterized by a greater quantity and a more uniform position distribution, even these synthetic images do not manifest unrealistic or additional relevant visual features compared to natural images. Moreover, our investigations on Q-former and Linear projector reveal that synthetic images may present token deviations after visual projection, thereby amplifying the hallucination bias.

arxiv情報

著者	Yifei Gao,Jiaqi Wang,Zhiyu Lin,Jitao Sang
発行日	2024-03-13 13:56:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AIGCs Confuse AI Too: Investigating and Explaining Synthetic Image-induced Hallucinations in Large Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー