Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

要約

大規模な言語モデル (大規模な LM) は、幻覚コンテンツを含むテキストを生成する可能性があります。
この問題の重要な例は、LM が同じコンテキスト内で 2 つの矛盾した文を生成する自己矛盾です。
この研究では、評価、検出、軽減をカバーする、さまざまな命令調整型 LM の自己矛盾に関する包括的な調査を紹介します。
私たちの分析では、LM がオープンドメインのトピックのテキストを生成するときに、たとえば ChatGPT によって生成された全文の 17.7% で自己矛盾が蔓延していることが明らかになりました。
自己矛盾は、検索ベースの方法を補完するものでもあります。検索ベースの方法の大部分 (たとえば、ChatGPT の 35.8%) は Wikipedia を使用して検証できないからです。
次に、自己矛盾を効果的に検出して軽減するように設計された、新しいプロンプトベースのフレームワークを提案します。
当社の検出器は、ChatGPT のプロンプト時に約 80% の F1 スコアなど、高い精度を達成します。
緩和アルゴリズムは、生成されたテキストを反復的に改良して、テキストの流暢性と有益性を維持しながら、矛盾する情報を削除します。
重要なのは、私たちのフレームワーク全体がブラックボックス LM に適用可能であり、外部の根拠のある知識を必要としないことです。
私たちのアプローチは実際に効果的であり、公衆に利益をもたらすプッシュボタンツールとしてリリースされており、https://chatprotect.ai/ で入手できます。

要約(オリジナル)

Large language models (large LMs) are susceptible to producing text that contains hallucinated content. An important instance of this problem is self-contradiction, where the LM generates two contradictory sentences within the same context. In this work, we present a comprehensive investigation into self-contradiction for various instruction-tuned LMs, covering evaluation, detection, and mitigation. Our analysis reveals the prevalence of self-contradictions when LMs generate text for open-domain topics, e.g., in 17.7% of all sentences produced by ChatGPT. Self-contradiction also complements retrieval-based methods, as a large portion of them (e.g., 35.8% for ChatGPT) cannot be verified using Wikipedia. We then propose a novel prompting-based framework designed to effectively detect and mitigate self-contradictions. Our detector achieves high accuracy, e.g., around 80% F1 score when prompting ChatGPT. The mitigation algorithm iteratively refines the generated text to remove contradictory information while preserving text fluency and informativeness. Importantly, our entire framework is applicable to black-box LMs and does not require external grounded knowledge. Our approach is practically effective and has been released as a push-button tool to benefit the public, available at https://chatprotect.ai/.

arxiv情報

著者	Niels Mündler,Jingxuan He,Slobodan Jenko,Martin Vechev
発行日	2023-10-01 07:22:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー