Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

要約

Fluent 世代の大規模言語モデル (LLM) を信頼するには、人間が信頼できる外部ソースに対してその正しさを検証できなければなりません。
取得した文書や事後的な出所を介して引用を提供するなどの最近の取り組みは、検証可能性を高めていますが、その正確さについての保証は提供しません。
これらの制限に対処するために、私たちは別の哲学で検証可能性の目標に取り組みます。それは、トレーニング前のデータ内で信頼できるソースからのステートメントをそのまま引用するモデルを開発することで検証プロセスを簡素化することです。
私たちは、モデルを見積もりに合わせる実現可能性を実証する Quote-Tuning を提案します。
Quote-Tuning の中核は、信頼できるコーパスに対してテキストを効率的に検証する高速メンバーシップ推論機能です。
このツールを活用して、モデル応答の引用を定量化する報酬関数を設計し、嗜好学習用のデータセットを厳選します。
実験によると、Quote-Tuning は応答品質を維持しながら、高品質の文書からの逐語的な引用を基本モデルと比較して最大 130% 大幅に増加させます。
Quote-Tuning はさまざまなタスクに適用でき、ドメイン外のデータや多様なモデルファミリに一般化され、真実性に対するさらなる利点を提供します。
私たちの方法は、見積もりを増やすための手間のかからない方法として機能するだけでなく、検証可能性の向上を通じて LLM の信頼性を向上させる道も開きます。

要約(オリジナル)

To trust the fluent generations of large language models (LLMs), humans must be able to verify their correctness against trusted, external sources. Recent efforts, such as providing citations via retrieved documents or post-hoc provenance, enhance verifiability but provide no guarantees on their correctness. To address these limitations, we tackle the verifiability goal with a different philosophy: trivializing the verification process by developing models that quote verbatim statements from trusted sources in their pre-training data. We propose Quote-Tuning, which demonstrates the feasibility of aligning models to quote. The core of Quote-Tuning is a fast membership inference function that efficiently verifies text against trusted corpora. We leverage this tool to design a reward function to quantify quotes in model responses, and curate datasets for preference learning. Experiments show that Quote-Tuning significantly increases verbatim quotes from high-quality documents by up to 130% relative to base models while maintaining response quality. Quote-Tuning is applicable in different tasks, generalizes to out-of-domain data and diverse model families, and provides additional benefits to truthfulness. Our method not only serves as a hassle-free method to increase quoting but also opens up avenues for improving LLM trustworthiness through better verifiability.

arxiv情報

著者	Jingyu Zhang,Marc Marone,Tianjian Li,Benjamin Van Durme,Daniel Khashabi
発行日	2024-11-14 18:27:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー