Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model Training

要約

大規模言語モデル (LLM) がさまざまな業界で導入されることが増えるにつれ、特に幻覚 (事実が不正確であるかユーザー入力と無関係な出力) による信頼性に関する懸念が高まっています。
私たちの研究では、主に事後の検出と軽減戦略に焦点を当てた既存の研究における重要なギャップに対処するために、トレーニングのプロセスと幻覚の出現との関係を調査しています。
Pythia スイートのモデル (70M ～ 12B パラメーター) といくつかの幻覚検出メトリクスを使用して、トレーニング全体の幻覚傾向を分析し、LLM の内部ダイナミクスを調査します。
トレーニング中の変動を減らすことで幻覚を軽減するように設計された新しいトレーニングプロトコルである Sensitivity Dropout (SenD) を紹介します。
SenD は、センシティブエンベディングインデックスと呼ばれる、変動が大きい埋め込みインデックスを決定的に削除することでこれを実現します。
さらに、教師なし幻覚検出メトリックである Efficient EigenScore (EES) を開発します。これは、従来の EigenScore を 2 倍の速度で近似します。
この効率的なメトリクスはプロトコルに統合されており、SenD は計算的にスケーラブルであり、幻覚の軽減にも効果的です。
私たちの経験的評価は、私たちのアプローチが通常のトレーニングと比較してテスト時の LLM の信頼性を最大 40% 向上させると同時に、LLM を Wikipedia、Medical、および LegalBench ドメインに適応させる際に事実の正確性を向上させる効率的な方法を提供することを示しています。

要約(オリジナル)

As large language models (LLMs) are increasingly deployed across various industries, concerns regarding their reliability, particularly due to hallucinations – outputs that are factually inaccurate or irrelevant to user input – have grown. Our research investigates the relationship between the training process and the emergence of hallucinations to address a key gap in existing research that focuses primarily on post hoc detection and mitigation strategies. Using models from the Pythia suite (70M – 12B parameters) and several hallucination detection metrics, we analyze hallucination trends throughout training and explore LLM internal dynamics. We introduce Sensitivity Dropout (SenD), a novel training protocol designed to mitigate hallucinations by reducing variance during training. SenD achieves this by deterministically dropping embedding indices with significant variability, referred to as Sensitive Embedding Indices. In addition, we develop an unsupervised hallucination detection metric, Efficient EigenScore (EES), which approximates the traditional EigenScore at 2x speed. This efficient metric is integrated into our protocol, allowing SenD to be both computationally scalable and effective at reducing hallucinations. Our empirical evaluation demonstrates that our approach improves LLM reliability at test time by up to 40% compared to normal training while also providing an efficient method to improve factual accuracy when adapting LLMs to Wikipedia, Medical, and LegalBench domains.

arxiv情報

著者	Shahrad Mohammadzadeh,Juan David Guerra,Marco Bonizzato,Reihaneh Rabbany,Golnoosh Farnadi
発行日	2025-01-07 14:56:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model Training

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー