SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters

要約

言語モデルのアライメントのための既存のプリファレンス最適化目標では、最適な性能を達成するために広範囲にチューニングしなければならない追加のハイパーパラメータを必要とし、大規模な言語モデルの微調整に必要な複雑さと時間の両方を増加させる。本論文では、アライメントのためのシンプルかつ効果的なハイパーパラメータ不要のプリファレンス最適化アルゴリズムを提案する。これは、プリファレンスデータセットにおいて、選択された応答と拒否された応答の指数化された平均対数尤度の逆数として計算される。結果として得られるシンプルな学習目的であるSimPERは実装が容易であり、高価なハイパーパラメーターのチューニングや参照モデルの必要性を排除し、計算効率とメモリー効率の両方を実現する。MT-Bench、AlpacaEval 2、およびOpen LLM Leaderboardの10の主要ベンチマークと5つの基本モデルを含む、広く使用されている実世界ベンチマークでの広範な実験により、ハイパーパラメーターや参照モデルがなくても、SimPERが既存のアプローチを一貫して大幅に上回ることが実証されました。例えば、そのシンプルさにもかかわらず、SimPERはAlpacaEval 2で最新手法を最大5.7ポイント上回り、Open LLM Leaderboardの10ベンチマークで最高の平均順位を達成しました。SimPERのソースコードはhttps://github.com/tengxiao1/SimPER。

要約(オリジナル)

Existing preference optimization objectives for language model alignment require additional hyperparameters that must be extensively tuned to achieve optimal performance, increasing both the complexity and time required for fine-tuning large language models. In this paper, we propose a simple yet effective hyperparameter-free preference optimization algorithm for alignment. We observe that promising performance can be achieved simply by optimizing inverse perplexity, which is calculated as the inverse of the exponentiated average log-likelihood of the chosen and rejected responses in the preference dataset. The resulting simple learning objective, SimPER, is easy to implement and eliminates the need for expensive hyperparameter tuning and a reference model, making it both computationally and memory efficient. Extensive experiments on widely used real-world benchmarks, including MT-Bench, AlpacaEval 2, and 10 key benchmarks of the Open LLM Leaderboard with 5 base models, demonstrate that SimPER consistently and significantly outperforms existing approaches-even without any hyperparameters or a reference model . For example, despite its simplicity, SimPER outperforms state-of-the-art methods by up to 5.7 points on AlpacaEval 2 and achieves the highest average ranking across 10 benchmarks on the Open LLM Leaderboard. The source code for SimPER is publicly available at: https://github.com/tengxiao1/SimPER.

arxiv情報

著者	Teng Xiao,Yige Yuan,Zhengyu Chen,Mingxiao Li,Shangsong Liang,Zhaochun Ren,Vasant G Honavar
発行日	2025-02-04 16:02:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー