Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment

要約

大規模言語モデル (LLM) は、多くの場合、対照的な調整目標と好みペアのデータセットを使用して調整されます。
モデル、ペアのデータ、および対物レンズ間の相互作用により、調整手順が複雑になり、場合によっては標準以下の結果が生じることがあります。
私たちはこれを研究し、(i) 基礎となる応答が対照的である場合、優先データはより良い学習シグナルを提供し、(ii) 調整目標は、トレーニング中にモデルに対するより多くの制御を指定すると、パフォーマンスの向上につながることがわかりました。
これらの洞察に基づいて、より対照的な嗜好ペアを導くデータ作成方法である Contrastive Learning from AI Revisions (CLAIR) と、制御可能でより安定した調整目標である Anchored Preference Optimization (APO) を紹介します。
私たちは、さまざまな比較可能なデータセットとアライメント目標を使用して Llama-3-8B-Instruct をアライメントし、人間の判断と高度に相関する MixEval-Hard スコアを測定します。
CLAIR の設定により、すべてのデータセットの中で最も優れたパフォーマンスが得られ、APO は制御しにくい目標を常に上回ります。
APO を使用して 32K CLAIR 設定でトレーニングされた当社の最高のモデルは、Llama-3-8B-Instruct を 7.65% 改善し、GPT4-turbo との差を 45% 縮めます。
私たちのコードは https://github.com/ContextualAI/CLAIR_and_APO で入手できます。

要約(オリジナル)

Large Language Models (LLMs) are often aligned using contrastive alignment objectives and preference pair datasets. The interaction between model, paired data, and objective makes alignment a complicated procedure, sometimes producing subpar results. We study this and find that (i) preference data gives a better learning signal when the underlying responses are contrastive, and (ii) alignment objectives lead to better performance when they specify more control over the model during training. Based on these insights, we introduce Contrastive Learning from AI Revisions (CLAIR), a data-creation method which leads to more contrastive preference pairs, and Anchored Preference Optimization (APO), a controllable and more stable alignment objective. We align Llama-3-8B-Instruct using various comparable datasets and alignment objectives and measure MixEval-Hard scores, which correlate highly with human judgments. The CLAIR preferences lead to the strongest performance out of all datasets, and APO consistently outperforms less controllable objectives. Our best model, trained on 32K CLAIR preferences with APO, improves Llama-3-8B-Instruct by 7.65%, closing the gap with GPT4-turbo by 45%. Our code is available at https://github.com/ContextualAI/CLAIR_and_APO.

arxiv情報

著者	Karel D’Oosterlinck,Winnie Xu,Chris Develder,Thomas Demeester,Amanpreet Singh,Christopher Potts,Douwe Kiela,Shikib Mehri
発行日	2024-08-12 16:24:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー