HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding

要約

大規模視覚言語モデル (LVLM) は、マルチモーダルコンテキストの解釈において優れた能力を実証していますが、常に物体幻覚 (OH) に悩まされます。
LVLM の OH を軽減するために設計された新しいデコードアルゴリズムである HALC を紹介します。
HALC は、視覚言語タスクにおいて個別のきめの細かい最適な視覚情報を活用し、ローカルとグローバルの両方のコンテキストで同時に動作します。
具体的には、HALC は、幻覚トークンをその場で修正するための堅牢な自動焦点グラウンディングメカニズム (ローカル) と、テキスト生成の品質を維持しながら OH を大幅に削減するための特殊なビームサーチアルゴリズム (グローバル) を統合します。
さらに、HALC は追加のトレーニングなしでプラグアンドプレイモジュールとして LVLM に統合できます。
広範な実験研究により、OH 削減における HALC の有効性が実証され、4 つのベンチマーク全体で最先端の性能を上回っています。

要約(オリジナル)

While large vision-language models (LVLMs) have demonstrated impressive capabilities in interpreting multi-modal contexts, they invariably suffer from object hallucinations (OH). We introduce HALC, a novel decoding algorithm designed to mitigate OH in LVLMs. HALC leverages distinct fine-grained optimal visual information in vision-language tasks and operates on both local and global contexts simultaneously. Specifically, HALC integrates a robust auto-focal grounding mechanism (locally) to correct hallucinated tokens on the fly, and a specialized beam search algorithm (globally) to significantly reduce OH while preserving text generation quality. Additionally, HALC can be integrated into any LVLMs as a plug-and-play module without extra training. Extensive experimental studies demonstrate the effectiveness of HALC in reducing OH, outperforming state-of-the-arts across four benchmarks.

arxiv情報

著者	Zhaorun Chen,Zhuokai Zhao,Hongyin Luo,Huaxiu Yao,Bo Li,Jiawei Zhou
発行日	2024-06-10 15:21:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー