Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language Models

要約

大規模な言語モデル（LLMS）の幻覚は、実際の信頼性が不可欠なヘルスケアから法律まで、現実世界のアプリケーション全体で増大する課題を提示します。
アラインメントと命令の調整の進歩にもかかわらず、LLMSは依然として流fluentでありながら根本的に真実ではない出力を生成できます。
これらの幻覚の根底にある認知的ダイナミクスを理解することは、未解決の問題のままです。
この研究では、幻覚を体系的にトリガーして定量化するためのプロンプトベースのフレームワークを提案します。幻覚を誘発するプロンプト（HIP）を提案します。これは、誤った概念（例えば、誤解を招くように、誤った概念とタロット分裂の定期的なテーブル）を融合させ、幻覚を定量化するプロンプト（HQP）を定量化する幻覚を定量化します。
複数のLLMにわたる制御された実験により、HIPは一貫してヌル融合コントロールよりも一貫性の少ない幻覚応答を生成することが明らかになりました。
これらの効果はモデル間で異なり、推論指向のLLMは一般的な目的のプロファイルとは異なるプロファイルを示しています。
私たちのフレームワークは、幻覚の脆弱性を研究するための再現可能なテストベッドを提供し、概念の不安定性の開始を検出し、自己調整できる、より安全で内省的なLLMの開発への扉を開きます。

要約(オリジナル)

Hallucinations in large language models (LLMs) present a growing challenge across real-world applications, from healthcare to law, where factual reliability is essential. Despite advances in alignment and instruction tuning, LLMs can still generate outputs that are fluent yet fundamentally untrue. Understanding the cognitive dynamics that underlie these hallucinations remains an open problem. In this study, we propose a prompt-based framework to systematically trigger and quantify hallucination: a Hallucination-Inducing Prompt (HIP), which synthetically fuses semantically distant concepts (e.g., periodic table of elements and tarot divination) in a misleading way, and a Hallucination Quantifying Prompt (HQP), which scores the plausibility, confidence, and coherence of the output. Controlled experiments across multiple LLMs revealed that HIPs consistently produced less coherent and more hallucinated responses than their null-fusion controls. These effects varied across models, with reasoning-oriented LLMs showing distinct profiles from general-purpose ones. Our framework provides a reproducible testbed for studying hallucination vulnerability, and opens the door to developing safer, more introspective LLMs that can detect and self-regulate the onset of conceptual instability.

arxiv情報

著者	Makoto Sato
発行日	2025-05-01 14:33:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー