ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models

要約

幻覚は、マルチモーダル大規模言語モデル (MLLM) に永続的な課題をもたらします。
ただし、幻覚を評価するための既存のベンチマークは一般に静的であるため、データ汚染の潜在的なリスクが見落とされる可能性があります。
この問題に対処するために、MLLM における物体の幻覚を存在レベルと属性レベルの両方で評価するように設計されたオープンセットの動的プロトコルである ODE を提案します。
ODE は、グラフベースの構造を採用して、現実世界のオブジェクトの概念、その属性、およびオブジェクト間の分布上の関連性を表現します。
この構造により、多様な分布基準に基づいた概念の組み合わせの抽出が容易になり、生成タスクと識別タスクの両方で幻覚を評価する構造化クエリのさまざまなサンプルが生成されます。
ODE は、新しいサンプルの生成、動的なコンセプトの組み合わせ、およびさまざまな配布頻度を通じて、データ汚染のリスクを軽減し、評価の範囲を広げます。
このプロトコルは、データが限られているシナリオを含む、一般的なシナリオと特殊なシナリオの両方に適用できます。
実験結果は私たちのプロトコルの有効性を実証しており、ODE で生成されたサンプルで評価すると MLLM がより高い幻覚率を示すことが明らかになり、これはデータ汚染の可能性を示しています。
さらに、これらの生成されたサンプルは、幻覚パターンの分析とモデルの微調整に役立ち、MLLM の幻覚を軽減するための効果的なアプローチを提供します。

要約(オリジナル)

Hallucination poses a persistent challenge for multimodal large language models (MLLMs). However, existing benchmarks for evaluating hallucinations are generally static, which may overlook the potential risk of data contamination. To address this issue, we propose ODE, an open-set, dynamic protocol designed to evaluate object hallucinations in MLLMs at both the existence and attribute levels. ODE employs a graph-based structure to represent real-world object concepts, their attributes, and the distributional associations between them. This structure facilitates the extraction of concept combinations based on diverse distributional criteria, generating varied samples for structured queries that evaluate hallucinations in both generative and discriminative tasks. Through the generation of new samples, dynamic concept combinations, and varied distribution frequencies, ODE mitigates the risk of data contamination and broadens the scope of evaluation. This protocol is applicable to both general and specialized scenarios, including those with limited data. Experimental results demonstrate the effectiveness of our protocol, revealing that MLLMs exhibit higher hallucination rates when evaluated with ODE-generated samples, which indicates potential data contamination. Furthermore, these generated samples aid in analyzing hallucination patterns and fine-tuning models, offering an effective approach to mitigating hallucinations in MLLMs.

arxiv情報

著者	Yahan Tu,Rui Hu,Jitao Sang
発行日	2024-12-02 08:51:09+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー