Prompt-Guided Internal States for Hallucination Detection of Large Language Models

要約

大規模言語モデル (LLM) は、さまざまなドメインのさまざまなタスクにわたって優れた機能を実証してきました。
ただし、論理的には一貫していても、事実としては不正確または誤解を招くような応答が生成されることがあります。これは、LLM 幻覚として知られています。
データ駆動型の教師あり手法では、LLM の内部状態を利用して幻覚検出器をトレーニングしますが、特定のドメインでトレーニングされた検出器は、他のドメインにうまく一般化するのに苦労することがよくあります。
この論文では、ドメイン内データのみを使用して教師あり検出器のクロスドメインのパフォーマンスを強化することを目的としています。
我々は、LLM の幻覚検出のためのプロンプトガイド型内部状態という新しいフレームワーク、つまり PRISM を提案します。
LLM の内部状態内のテキストの真実性に関連する構造の変更をガイドする適切なプロンプトを利用することで、この構造をより顕著にし、さまざまなドメインのテキスト間で一貫性のあるものにします。
私たちはフレームワークを既存の幻覚検出方法と統合し、さまざまなドメインのデータセットで実験を実施しました。
実験結果は、私たちのフレームワークが既存の幻覚検出方法のクロスドメイン一般化を大幅に強化することを示しています。

要約(オリジナル)

Large Language Models (LLMs) have demonstrated remarkable capabilities across a variety of tasks in different domains. However, they sometimes generate responses that are logically coherent but factually incorrect or misleading, which is known as LLM hallucinations. Data-driven supervised methods train hallucination detectors by leveraging the internal states of LLMs, but detectors trained on specific domains often struggle to generalize well to other domains. In this paper, we aim to enhance the cross-domain performance of supervised detectors with only in-domain data. We propose a novel framework, prompt-guided internal states for hallucination detection of LLMs, namely PRISM. By utilizing appropriate prompts to guide changes in the structure related to text truthfulness within the LLM’s internal states, we make this structure more salient and consistent across texts from different domains. We integrated our framework with existing hallucination detection methods and conducted experiments on datasets from different domains. The experimental results indicate that our framework significantly enhances the cross-domain generalization of existing hallucination detection methods.

arxiv情報

著者	Fujie Zhang,Peiqi Yu,Biao Yi,Baolei Zhang,Tong Li,Zheli Liu
発行日	2024-11-07 16:33:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Prompt-Guided Internal States for Hallucination Detection of Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー