Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

要約

この研究は、大規模言語モデル (LLM) を人間の好みに合わせて調整する方法である直接好み最適化 (DPO) のトレーニングデータセットにおけるノイズの課題に取り組んでいます。
ノイズを、低品質のデータポイントを含む点ごとのノイズと、優先順位に影響を与える誤ったデータペアの関連付けを含むペアごとのノイズに分類します。
Distributionally Robust Optimization (DRO) を利用することで、この種のノイズに対する DPO の回復力が強化されます。
私たちの理論的洞察により、DPO には本質的に DRO 原理が組み込まれており、点単位のノイズに対する堅牢性が付与されており、正則化係数 $\beta$ がそのノイズ耐性において重要な役割を果たしていることが明らかになりました。
このフレームワークを拡張して、最悪のペアごとのシナリオに対して最適化することでペアごとの堅牢性を統合する、Distributionally Robustifying DPO (Dr. DPO) を導入します。
Dr. DPO の新しいハイパーパラメータ $\beta’$ は、データペアの信頼性を微調整して制御することを可能にし、騒々しいトレーニング環境での探索と活用の間の戦略的なバランスを提供します。
経験的評価により、Dr. DPO は生成されるテキストの品質と嗜好データセットの応答精度を大幅に向上させ、ノイズの多い設定とノイズのない設定の両方でパフォーマンスが向上することが実証されています。
コードは https://github.com/junkangwu/Dr_DPO で入手できます。

要約(オリジナル)

This study addresses the challenge of noise in training datasets for Direct Preference Optimization (DPO), a method for aligning Large Language Models (LLMs) with human preferences. We categorize noise into pointwise noise, which includes low-quality data points, and pairwise noise, which encompasses erroneous data pair associations that affect preference rankings. Utilizing Distributionally Robust Optimization (DRO), we enhance DPO’s resilience to these types of noise. Our theoretical insights reveal that DPO inherently embeds DRO principles, conferring robustness to pointwise noise, with the regularization coefficient $\beta$ playing a critical role in its noise resistance. Extending this framework, we introduce Distributionally Robustifying DPO (Dr. DPO), which integrates pairwise robustness by optimizing against worst-case pairwise scenarios. The novel hyperparameter $\beta’$ in Dr. DPO allows for fine-tuned control over data pair reliability, providing a strategic balance between exploration and exploitation in noisy training environments. Empirical evaluations demonstrate that Dr. DPO substantially improves the quality of generated text and response accuracy in preference datasets, showcasing enhanced performance in both noisy and noise-free settings. The code is available at https://github.com/junkangwu/Dr_DPO.

arxiv情報

著者	Junkang Wu,Yuexiang Xie,Zhengyi Yang,Jiancan Wu,Jiawei Chen,Jinyang Gao,Bolin Ding,Xiang Wang,Xiangnan He
発行日	2024-07-10 17:48:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー