Plug-and-Play Training Framework for Preference Optimization

要約

最近、DPO などのプリファレンス最適化手法により、対話や質問応答を含む幅広いタスクにおいて大規模言語モデル (LLM) が大幅に強化されました。
ただし、現在の方法では、プリファレンスの最適化中にトレーニングサンプルのさまざまな難易度レベルを考慮することができず、高精度が要求されるタスク、特に数学的推論のパフォーマンスが平凡になります。
この制限に対処するために、私たちは、複数のサンプリングを使用して出力分布を分析し、サンプルに異なる重みを割り当て、これらの重みを好みの最適化プロセスに組み込む、新しいトレーニングフレームワークを提案します。
このプラグアンドプレイのアプローチにより、LLM はトレーニング中に困難なサンプルを優先することができ、学習効率が向上します。
実験結果は、私たちのフレームワークがさまざまな設定最適化手法とシームレスに統合し、数学的推論タスクで一貫した改善を達成することを示しています。

要約(オリジナル)

Recently, preference optimization methods such as DPO have significantly enhanced large language models (LLMs) in wide tasks including dialogue and question-answering. However, current methods fail to account for the varying difficulty levels of training samples during preference optimization, leading to mediocre performance in tasks with high accuracy requirements, particularly in mathematical reasoning. To address this limitation, we propose a novel training framework, which employs multiple sampling to analyze output distributions, assign different weights to samples, and incorporate these weights into the preference optimization process. This plug-and-play approach enables LLMs to prioritize challenging examples during training, improving learning efficiency. Experimental results demonstrate that our framework integrates seamlessly with various preference optimization methods and achieves consistent improvements in mathematical reasoning tasks.

arxiv情報

著者	Jingyuan Ma,Rui Li,Zheng Li,Lei Sha,Zhifang Sui
発行日	2024-12-30 15:01:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Plug-and-Play Training Framework for Preference Optimization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー