Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning

要約

大規模言語モデル (LLM) はさまざまなタスクで大きな成功を収めていますが、幻覚の発生に悩まされています。
多次元直交プローブを使用して隠された真実の表現を明らかにすることで、LLM の真実性を強化する手法である Truth Forest を紹介します。
具体的には、直交制約をプローブに組み込むことで、真実をモデル化するための複数の直交ベースを作成します。
さらに、シーケンス内の広範囲の位置を考慮する系統的な手法である Random Peek を導入し、LLM での真実の特徴の識別と生成の間のギャップを削減します。
このアプローチを採用することで、TruthfulQA での Llama-2-7B の信頼性が 40.8\% から 74.5\% に向上しました。
同様に、微調整されたモデルでも大幅な改善が見られます。
プローブを使用して真実の特徴を徹底的に分析しました。
私たちの視覚化の結果は、直交プローブが相補的な真実関連の特徴を捕捉し、データセットの固有の構造を明らかにする明確に定義されたクラスターを形成していることを示しています。
コード: \url{https://github.com/jongjyh/trfr}

要約(オリジナル)

Despite the great success of large language models (LLMs) in various tasks, they suffer from generating hallucinations. We introduce Truth Forest, a method that enhances truthfulness in LLMs by uncovering hidden truth representations using multi-dimensional orthogonal probes. Specifically, it creates multiple orthogonal bases for modeling truth by incorporating orthogonal constraints into the probes. Moreover, we introduce Random Peek, a systematic technique considering an extended range of positions within the sequence, reducing the gap between discerning and generating truth features in LLMs. By employing this approach, we improved the truthfulness of Llama-2-7B from 40.8\% to 74.5\% on TruthfulQA. Likewise, significant improvements are observed in fine-tuned models. We conducted a thorough analysis of truth features using probes. Our visualization results show that orthogonal probes capture complementary truth-related features, forming well-defined clusters that reveal the inherent structure of the dataset. Code: \url{https://github.com/jongjyh/trfr}

arxiv情報

著者	Zhongzhi Chen,Xingwu Sun,Xianfeng Jiao,Fengzong Lian,Zhanhui Kang,Di Wang,Cheng-Zhong Xu
発行日	2023-12-29 06:08:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー