Whispers that Shake Foundations: Analyzing and Mitigating False Premise Hallucinations in Large Language Models

要約

大規模言語モデル (LLM) は優れた機能を示していますが、依然として幻覚の問題に悩まされています。
この問題の重要なタイプは、誤った前提の幻覚です。これは、LLM が誤った前提の質問に直面したときに幻覚テキストを生成する現象として定義されます。
この論文では、誤った前提幻覚の包括的な分析を実行し、その内部動作メカニズムを解明します。注意ヘッドの少数のサブセット（誤った前提ヘッドと指定します）が知識抽出プロセスを妨害し、誤った前提幻覚の発生につながります。
。
私たちの分析に基づいて、私たちは \textbf{FAITH} (\textbf{F} は \textbf{H} の幻覚を理解するための \textbf{A} 注意頭の制約\textbf{I} を前提としています) を提案します。
誤った前提の幻覚を軽減するための斬新で効果的な方法。
これは、モデル推論プロセス中に誤った前提の注意を制限します。
印象的なことに、広範な実験により、モデル内のアテンションヘッドを約 $1\%$ だけ制約するだけで、モデルのパフォーマンスが $20\%$ 近くも大幅に向上することが実証されました。

要約(オリジナル)

Large Language Models (LLMs) have shown impressive capabilities but still suffer from the issue of hallucinations. A significant type of this issue is the false premise hallucination, which we define as the phenomenon when LLMs generate hallucinated text when confronted with false premise questions. In this paper, we perform a comprehensive analysis of the false premise hallucination and elucidate its internal working mechanism: a small subset of attention heads (which we designate as false premise heads) disturb the knowledge extraction process, leading to the occurrence of false premise hallucination. Based on our analysis, we propose \textbf{FAITH} (\textbf{F}alse premise \textbf{A}ttention head constra\textbf{I}ining for mi\textbf{T}igating \textbf{H}allucinations), a novel and effective method to mitigate false premise hallucinations. It constrains the false premise attention heads during the model inference process. Impressively, extensive experiments demonstrate that constraining only approximately $1\%$ of the attention heads in the model yields a notable increase of nearly $20\%$ of model performance.

arxiv情報

著者	Hongbang Yuan,Pengfei Cao,Zhuoran Jin,Yubo Chen,Daojian Zeng,Kang Liu,Jun Zhao
発行日	2024-02-29 12:35:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Whispers that Shake Foundations: Analyzing and Mitigating False Premise Hallucinations in Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー