SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

要約

GPT-3 などの Generative Large Language Models (LLM) は、さまざまなユーザープロンプトに対して非常に流暢な応答を生成できます。
ただし、LLM は事実を幻覚させ、事実に基づかない陳述を行うことが知られています。
既存のファクトチェックアプローチでは、トークンレベルの出力確率分布 (ChatGPT などのシステムでは利用できない場合があります) へのアクセス、または別個の、しばしば複雑なモジュールを介して接続される外部データベースへのアクセスが必要です。
この作業では、「SelfCheckGPT」を提案します。これは、外部データベースなしで、リソースなしでブラックボックスモデルのファクトチェックに使用できるシンプルなサンプリングベースのアプローチです。
SelfCheckGPT は、LLM が特定の概念についての知識を持っている場合、サンプリングされた応答は類似しており、一貫した事実を含んでいる可能性が高いという単純な考えを活用しています。
ただし、幻覚の事実の場合、確率的にサンプリングされた応答は発散し、互いに矛盾する可能性があります。
このアプローチを調査するには、GPT-3 を使用して WikiBio データセットから個人に関する文章を生成し、生成された文章の事実性に手動で注釈を付けます。
SelfCheckGPT が次のことができることを示します。i) 非事実文と事実文を検出します。
ii) 事実の観点からパッセージをランク付けします。
私たちのアプローチをいくつかの既存のベースラインと比較し、文の幻覚検出では、私たちのアプローチがグレーボックス法に匹敵する AUC-PR スコアを持ち、SelfCheckGPT がパッセージの事実性の評価に最適であることを示します。

要約(オリジナル)

Generative Large Language Models (LLMs) such as GPT-3 are capable of generating highly fluent responses to a wide variety of user prompts. However, LLMs are known to hallucinate facts and make non-factual statements which can undermine trust in their output. Existing fact-checking approaches either require access to token-level output probability distribution (which may not be available for systems such as ChatGPT) or external databases that are interfaced via separate, often complex, modules. In this work, we propose ‘SelfCheckGPT’, a simple sampling-based approach that can be used to fact-check black-box models in a zero-resource fashion, i.e. without an external database. SelfCheckGPT leverages the simple idea that if a LLM has knowledge of a given concept, sampled responses are likely to be similar and contain consistent facts. However, for hallucinated facts, stochastically sampled responses are likely to diverge and contradict one another. We investigate this approach by using GPT-3 to generate passages about individuals from the WikiBio dataset, and manually annotate the factuality of the generated passages. We demonstrate that SelfCheckGPT can: i) detect non-factual and factual sentences; and ii) rank passages in terms of factuality. We compare our approach to several existing baselines and show that in sentence hallucination detection, our approach has AUC-PR scores comparable to grey-box methods, while SelfCheckGPT is best at passage factuality assessment.

arxiv情報

著者	Potsawee Manakul,Adian Liusie,Mark J. F. Gales
発行日	2023-03-15 19:31:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー