How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation

要約

大規模な言語モデル（LLM）は多様なシナリオに広く展開されているため、誤った情報を暗黙のうちに広めることができる程度は、重大な安全性の懸念として浮上します。
現在の研究は、主に明示的な虚偽の陳述に関するLLMを評価し、誤報が現実世界の相互作用において挑戦されていない前提として微妙に現れることが多いことを見落としています。
誤った誤った情報のための最初の包括的なベンチマークであるエコム主義者をキュレーションしました。そこでは、誤った仮定がLLMSへのクエリに埋め込まれています。
エコー主義者の標的は、現実的な人間との会話やソーシャルメディアの相互作用を含む、多様な情報源からの循環、有害、そして絶えず進化し続ける暗黙の誤った情報を循環しています。
15の最先端のLLMに関する広範な経験的研究を通じて、現在のモデルはこのタスクで驚くほど不十分に機能し、多くの場合、誤った前提を検出し、反事実的な説明を生成しないことがわかります。
また、暗黙の誤った情報に対抗するためのLLMSの能力を高めるために、2つの緩和方法、つまり自己アラートとぼろきれを調査します。
私たちの調査結果は、エコー主義者が永続的な課題であり続け、暗黙の誤った情報のリスクを防ぐための重要な必要性を強調していることを示しています。

要約(オリジナル)

As Large Language Models (LLMs) are widely deployed in diverse scenarios, the extent to which they could tacitly spread misinformation emerges as a critical safety concern. Current research primarily evaluates LLMs on explicit false statements, overlooking how misinformation often manifests subtly as unchallenged premises in real-world interactions. We curated EchoMist, the first comprehensive benchmark for implicit misinformation, where false assumptions are embedded in the query to LLMs. EchoMist targets circulated, harmful, and ever-evolving implicit misinformation from diverse sources, including realistic human-AI conversations and social media interactions. Through extensive empirical studies on 15 state-of-the-art LLMs, we find that current models perform alarmingly poorly on this task, often failing to detect false premises and generating counterfactual explanations. We also investigate two mitigation methods, i.e., Self-Alert and RAG, to enhance LLMs’ capability to counter implicit misinformation. Our findings indicate that EchoMist remains a persistent challenge and underscore the critical need to safeguard against the risk of implicit misinformation.

arxiv情報

著者	Ruohao Guo,Wei Xu,Alan Ritter
発行日	2025-05-27 16:40:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー