Do Language Models Know When They’re Hallucinating References?

要約

最先端の言語モデル (LM) は、幻覚情報を生成しやすいことで知られています。
このような不正確な出力は、これらのモデルの信頼性を損なうだけでなく、その使用を制限し、誤った情報やプロパガンダに関する深刻な懸念を引き起こします。
この研究では、幻覚を起こした書籍や論文の参考資料に焦点を当て、それらが頻繁に見分けられやすい性質を持っているため、言語モデル幻覚研究の「モデル生物」として提示します。
言語モデルがその出力内で特定の参照を引用する場合、理想的には、その他の関連する詳細の中でもとりわけ、その作成者とコンテンツに関する十分な情報を保持している必要があると私たちは仮定します。
この基本的な洞察を使用して、外部リソースをまったく参照せずに、参照に関する一連の直接的または間接的なクエリを言語モデルに問い合わせることによって、幻覚参照を特定できることを示します。
これらのクエリは「一貫性チェック」とみなすことができます。
私たちの調査結果は、GPT-4 を含む LM が幻覚参照文献の一貫性のない著者リストを作成することが多い一方で、実際の参照文献の著者を正確に思い出すことも多いことを強調しています。
この意味で、LM は幻覚参照を「知っている」と言えます。
さらに、これらの発見は、幻覚を起こした参考資料を解剖してその性質を明らかにする方法を示しています。
レプリケーションのコードと結果は、https://github.com/microsoft/hallucinated-references でご覧いただけます。

要約(オリジナル)

State-of-the-art language models (LMs) are notoriously susceptible to generating hallucinated information. Such inaccurate outputs not only undermine the reliability of these models but also limit their use and raise serious concerns about misinformation and propaganda. In this work, we focus on hallucinated book and article references and present them as the ‘model organism’ of language model hallucination research, due to their frequent and easy-to-discern nature. We posit that if a language model cites a particular reference in its output, then it should ideally possess sufficient information about its authors and content, among other relevant details. Using this basic insight, we illustrate that one can identify hallucinated references without ever consulting any external resources, by asking a set of direct or indirect queries to the language model about the references. These queries can be considered as ‘consistency checks.’ Our findings highlight that while LMs, including GPT-4, often produce inconsistent author lists for hallucinated references, they also often accurately recall the authors of real references. In this sense, the LM can be said to ‘know’ when it is hallucinating references. Furthermore, these findings show how hallucinated references can be dissected to shed light on their nature. Replication code and results can be found at https://github.com/microsoft/hallucinated-references.

arxiv情報

著者	Ayush Agrawal,Mirac Suzgun,Lester Mackey,Adam Tauman Kalai
発行日	2024-03-20 13:12:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Do Language Models Know When They’re Hallucinating References?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー