AMPO: Active Multi-Preference Optimization

要約

マルチプレーファレンス最適化は、役立つものや望ましくない応答のセット全体を対比することにより、ペアワイズの好みを超えて言語モデルのアラインメントを充実させ、それにより、大規模な言語モデルのより豊富なトレーニング信号を可能にします。
自己プレイの調整中、これらのモデルは多くの場合、クエリごとに多数の候補者の回答を生成し、トレーニング目標にすべての応答を含めるために計算上無効になります。
この作業では、$ \ textIT {Active Multi-Preference Optimization} $（AMPO）を提案します。これは、ポリシー上の生成、マルチプレーショングループ制御の損失、およびアクティブなサブセット選択を組み合わせた新しいアプローチです。
具体的には、応答の大規模な候補プールをスコアリングして埋め込み、その後、極端な報酬と異なるセマンティッククラスターをカバーする小さくても有益なサブセットを選択して、優先最適化を選択します。
私たちの対照的なトレーニングスキームは、最高の答えと最悪の答えだけでなく、堅牢なアライメントに不可欠な微妙で露出していないモードを特定することができます。
理論的には、アクティブ選択方法を使用した予想される報酬の最大化の保証を提供し、経験的には、AMPOはLlama 8bを使用して$ \ textit {alpacaeval} $で最新の結果を達成します。

要約(オリジナル)

Multi-preference optimization enriches language-model alignment beyond pairwise preferences by contrasting entire sets of helpful and undesired responses, thereby enabling richer training signals for large language models. During self-play alignment, these models often produce numerous candidate answers per query, rendering it computationally infeasible to include all responses in the training objective. In this work, we propose $\textit{Active Multi-Preference Optimization}$ (AMPO), a novel approach that combines on-policy generation, a multi-preference group-contrastive loss, and active subset selection. Specifically, we score and embed large candidate pools of responses and then select a small, yet informative, subset that covers reward extremes and distinct semantic clusters for preference optimization. Our contrastive training scheme is capable of identifying not only the best and worst answers but also subtle, underexplored modes that are crucial for robust alignment. Theoretically, we provide guarantees for expected reward maximization using our active selection method, and empirically, AMPO achieves state-of-the-art results on $\textit{AlpacaEval}$ using Llama 8B.

arxiv情報

著者	Taneesh Gupta,Rahul Madhavan,Xuchao Zhang,Chetan Bansal,Saravan Rajmohan
発行日	2025-02-25 15:29:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AMPO: Active Multi-Preference Optimization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー