ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability

要約

検索拡張生成 (RAG) モデルは、外部の知識を組み込むように設計されており、不十分なパラメトリック (内部) 知識によって引き起こされる幻覚を軽減します。
ただし、正確で関連性の高いコンテンツが取得された場合でも、RAG モデルは取得した情報と矛盾する出力を生成することで幻覚を引き起こす可能性があります。
このような幻覚を検出するには、大規模言語モデル (LLM) が外部およびパラメトリックな知識をどのように利用するかを解明する必要があります。
現在の検出方法は、多くの場合、これらのメカニズムの 1 つに焦点を当てているか、それらの絡み合った影響を切り離すことなく、正確な検出を困難にしています。
この論文では、RAG シナリオにおける幻覚の背後にある内部メカニズムを調査します。
私たちは、LLM のナレッジ FFN が残差ストリームのパラメトリック知識を過度に強調する一方で、コピーヘッドが取得したコンテンツからの外部知識を効果的に保持または統合できないときに幻覚が発生することを発見しました。
これらの発見に基づいて、LLM による外部コンテキストとパラメトリック知識の利用を切り離すことで幻覚を検出する新しい方法である ReDeEP を提案します。
私たちの実験では、ReDeEP が RAG 幻覚検出精度を大幅に向上させることが示されました。
さらに、ナレッジ FFN とコピーヘッドの寄与を調整することで幻覚を軽減する AARF を紹介します。

要約(オリジナル)

Retrieval-Augmented Generation (RAG) models are designed to incorporate external knowledge, reducing hallucinations caused by insufficient parametric (internal) knowledge. However, even with accurate and relevant retrieved content, RAG models can still produce hallucinations by generating outputs that conflict with the retrieved information. Detecting such hallucinations requires disentangling how Large Language Models (LLMs) utilize external and parametric knowledge. Current detection methods often focus on one of these mechanisms or without decoupling their intertwined effects, making accurate detection difficult. In this paper, we investigate the internal mechanisms behind hallucinations in RAG scenarios. We discover hallucinations occur when the Knowledge FFNs in LLMs overemphasize parametric knowledge in the residual stream, while Copying Heads fail to effectively retain or integrate external knowledge from retrieved content. Based on these findings, we propose ReDeEP, a novel method that detects hallucinations by decoupling LLM’s utilization of external context and parametric knowledge. Our experiments show that ReDeEP significantly improves RAG hallucination detection accuracy. Additionally, we introduce AARF, which mitigates hallucinations by modulating the contributions of Knowledge FFNs and Copying Heads.

arxiv情報

著者	Zhongxiang Sun,Xiaoxue Zang,Kai Zheng,Yang Song,Jun Xu,Xiao Zhang,Weijie Yu,Yang Song,Han Li
発行日	2025-01-21 16:05:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー