Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation

要約

大規模言語モデル (LLM) は、ますます人間に似た能力を示しているにもかかわらず、関連する知識を保持している場合でも、事実の不正確さ、つまり「幻覚」に悩まされることがよくあります。
これらの幻覚に対処するために、現在のアプローチでは通常、高品質の人間による事実性の注釈が必要です。
この研究では、事実性のための自己調整について検討します。LLM の自己評価機能を利用して、モデルを事実性に向けて導くトレーニング信号を提供します。
具体的には、自己評価コンポーネントである Self-Eval を組み込んで、LLM が内部の知識のみに基づいて自身が生成した応答の事実性を検証するように促します。
さらに、モデルの信頼性推定と校正を改善することで LLM の自己評価能力を強化する自己知識チューニング (SK-Tuning) を設計します。
次に、これらの自己注釈付き応答を利用して、直接優先最適化アルゴリズムを通じてモデルを微調整します。
私たちは、TruthfulQA と BioGEN に関する 3 つの重要な知識集約型タスクにわたって、提案された自己調整アプローチがラマ家族モデルよりも事実の正確さを大幅に向上させることを示します。

要約(オリジナル)

Despite showing increasingly human-like abilities, large language models (LLMs) often struggle with factual inaccuracies, i.e. ‘hallucinations’, even when they hold relevant knowledge. To address these hallucinations, current approaches typically necessitate high-quality human factuality annotations. In this work, we explore Self-Alignment for Factuality, where we leverage the self-evaluation capability of an LLM to provide training signals that steer the model towards factuality. Specifically, we incorporate Self-Eval, a self-evaluation component, to prompt an LLM to validate the factuality of its own generated responses solely based on its internal knowledge. Additionally, we design Self-Knowledge Tuning (SK-Tuning) to augment the LLM’s self-evaluation ability by improving the model’s confidence estimation and calibration. We then utilize these self-annotated responses to fine-tune the model via Direct Preference Optimization algorithm. We show that the proposed self-alignment approach substantially enhances factual accuracy over Llama family models across three key knowledge-intensive tasks on TruthfulQA and BioGEN.

arxiv情報

著者	Xiaoying Zhang,Baolin Peng,Ye Tian,Jingyan Zhou,Lifeng Jin,Linfeng Song,Haitao Mi,Helen Meng
発行日	2024-02-14 15:52:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー