Improving Context-Aware Preference Modeling for Language Models

要約

ペアごとの好みに基づいて言語モデルを微調整することは非常に効果的であることが証明されていますが、自然言語の仕様が不十分な性質により、重大な課題が生じます。
直接の好みのフィードバックは解釈できず、多次元の基準が適用される可能性がある場合に提供するのが難しく、不完全な指示に基づいているか、さまざまなプリンシパルによって提供されているため、一貫性がないことがよくあります。
これらの課題に対処するために、最初にコンテキストを選択することで不足仕様を解決し、次に選択したコンテキストに関してプリファレンスを評価する 2 段階のプリファレンスモデリング手順を検討します。
これらの 2 つのステップに従って報酬モデリングの誤差を分解します。これは、コンテキスト固有の好みに加えてコンテキストを監視することが、モデルを人間の多様な好みに合わせるための実行可能なアプローチである可能性があることを示唆しています。
これが機能するには、コンテキスト固有の好みを評価するモデルの機能が重要です。
この目的を達成するために、私たちは文脈条件付き嗜好データセットと、文脈固有の嗜好を評価する言語モデルの能力を調査する付随する実験に貢献します。
私たちはデータセットを使用して、(1) 既存の選好モデルが追加されたコンテキストから恩恵を受けているものの十分に考慮できていないことを示し、(2) GPT-4 および Llama 3 70B を超えるコンテキスト固有のパフォーマンスでコンテキストを認識した報酬モデルを微調整します。
(3) コンテキストを意識したプリファレンスモデリングの価値を調査します。

要約(オリジナル)

While finetuning language models from pairwise preferences has proven remarkably effective, the underspecified nature of natural language presents critical challenges. Direct preference feedback is uninterpretable, difficult to provide where multidimensional criteria may apply, and often inconsistent, either because it is based on incomplete instructions or provided by diverse principals. To address these challenges, we consider the two-step preference modeling procedure that first resolves the under-specification by selecting a context, and then evaluates preference with respect to the chosen context. We decompose reward modeling error according to these two steps, which suggests that supervising context in addition to context-specific preference may be a viable approach to aligning models with diverse human preferences. For this to work, the ability of models to evaluate context-specific preference is critical. To this end, we contribute context-conditioned preference datasets and accompanying experiments that investigate the ability of language models to evaluate context-specific preference. We use our datasets to (1) show that existing preference models benefit from, but fail to fully consider, added context, (2) finetune a context-aware reward model with context-specific performance exceeding that of GPT-4 and Llama 3 70B on tested datasets, and (3) investigate the value of context-aware preference modeling.

arxiv情報

著者	Silviu Pitis,Ziang Xiao,Nicolas Le Roux,Alessandro Sordoni
発行日	2024-11-06 16:11:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Improving Context-Aware Preference Modeling for Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー