Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language Models

要約

ラージビジョン言語モデル (LVLM) の最近の進歩により、人間の言語による視覚情報理解における優れた能力が実証されました。
これらの進歩にもかかわらず、LVLM は依然として、視覚情報に存在しないオブジェクトのテキスト説明を生成するなど、マルチモーダル幻覚に関する課題に直面しています。
しかし、マルチモーダル幻覚の根底にある根本的な理由はまだ十分に調査されていません。
この論文では、LVLM に固有のバイアスが幻覚の重要な要因である可能性を示唆する新しい視点を提案します。
具体的には、段落区切り (\n\n) に関連するセマンティックシフトバイアスを体系的に特定します。このバイアスでは、トレーニングデータ内の ‘\n\n’ の前後のコンテンツが頻繁に重大なセマンティック変化を示します。
このパターンにより、モデルは、「\n\n」に続くコンテンツは、幻覚的な説明が少ない前のコンテンツとは明らかに異なるはずであると推論するため、「\n\n」に続く幻覚的な説明の可能性が高まります。
私たちは、複数の公的に入手可能な LVLM でこの仮説を検証しました。
さらに、生成された説明に意図的に「\n\n」を挿入すると、より多くの幻覚を誘発する可能性があることがわかりました。
‘\n’ の出力をスキップすることで、LVLM の幻覚を効果的に軽減する簡単な方法が提案されています。

要約(オリジナル)

Recent advancements in large vision-language models (LVLMs) have demonstrated impressive capability in visual information understanding with human language. Despite these advances, LVLMs still face challenges with multimodal hallucination, such as generating text descriptions of objects that are not present in the visual information. However, the underlying fundamental reasons of multimodal hallucinations remain poorly explored. In this paper, we propose a new perspective, suggesting that the inherent biases in LVLMs might be a key factor in hallucinations. Specifically, we systematically identify a semantic shift bias related to paragraph breaks (\n\n), where the content before and after ‘\n\n’ in the training data frequently exhibit significant semantic changes. This pattern leads the model to infer that the contents following ‘\n\n’ should be obviously different from the preceding contents with less hallucinatory descriptions, thereby increasing the probability of hallucinatory descriptions subsequent to the ‘\n\n’. We have validated this hypothesis on multiple publicly available LVLMs. Besides, we find that deliberately inserting ‘\n\n’ at the generated description can induce more hallucinations. A simple method is proposed to effectively mitigate the hallucination of LVLMs by skipping the output of ‘\n’.

arxiv情報

著者	Zongbo Han,Zechen Bai,Haiyang Mei,Qianli Xu,Changqing Zhang,Mike Zheng Shou
発行日	2024-02-12 13:53:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー