Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

要約

Fluent 世代の大規模言語モデル (LLM) を信頼するには、人間が信頼できる外部ソースに対してその正しさを検証できなければなりません。
取得した文書や事後出典を介して引用を提供するなどの最近の取り組みは、検証可能性を高めていますが、依然としてその正確性についての保証は提供していません。
これらの制限に対処するために、私たちは別の哲学で検証可能性の目標に取り組みます。それは、トレーニング前のデータ内で信頼できるソースからのステートメントをそのまま引用するモデルを開発することで検証プロセスを簡素化することです。
私たちは引用チューニングを提案し、事前トレーニング中に記憶されたデータから引用されたステートメントを提供するように LLM を調整することが実現可能であることを実証します。
Quote-Tuning の中核は、信頼できるコーパスに対してテキストを効率的に検証する高速メンバーシップ推論関数 (Marone and Van Durme、2023) です。
このツールを利用して、モデル応答内の引用を定量化する報酬関数を設計します。その後、この関数を使用して、嗜好学習用のデータセットを作成します。
実験結果によると、Quote-Tuning は、応答品質を維持しながら、未調整のモデルと比較して、高品質の事前トレーニングドキュメントからの逐語的な引用を 55% ～ 130% 大幅に増加させます。
また、Quote-Tuning はドメイン外データへの引用を一般化し、さまざまなタスクに適用でき、真実性に対する追加の利点を提供します。
私たちの方法は、見積もりを増やすための手間のかからない方法として機能するだけでなく、検証可能性の向上を通じて LLM の信頼性を向上させる道も開きます。

要約(オリジナル)

To trust the fluent generations of large language models (LLMs), humans must be able to verify their correctness against trusted, external sources. Recent efforts, such as providing citations via retrieved documents or post-hoc provenance, enhance verifiability but still provide no guarantees on their correctness. To address these limitations, we tackle the verifiability goal with a different philosophy: trivializing the verification process by developing models that quote verbatim statements from trusted sources in pre-training data. We propose Quote-Tuning, and demonstrate it is feasible to align LLMs to provide quoted statements from data memorized during pre-training. The core of Quote-Tuning is a fast membership inference function (Marone and Van Durme, 2023) that efficiently verifies text against a trusted corpus. We leverage this tool to design a reward function to quantify quotes in model responses, which is then used to create a dataset for preference learning. Experimental results show that Quote-Tuning significantly increases verbatim quotes from high-quality pre-training documents by 55% to 130% relative to un-tuned models while maintaining response quality. Quote-Tuning also generalizes quoting to out-of-domain data, is applicable in different tasks, and provides additional benefits to truthfulness. Our method not only serves as a hassle-free method to increase quoting but also opens up avenues for improving LLM trustworthiness through better verifiability.

arxiv情報

著者	Jingyu Zhang,Marc Marone,Tianjian Li,Benjamin Van Durme,Daniel Khashabi
発行日	2024-08-21 15:23:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー