Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models

要約

事前トレーニングされた大規模言語モデル (LLM) に指示に従う能力を与える調整は、実際のアプリケーションにとって非常に重要です。
従来の教師あり微調整 (SFT) 手法は、通常、クロスエントロピー目的を備えた因果言語モデリングとして形式化されており、大量の高品質の命令と応答のペアが必要です。
しかし、広く使われているSFTデータセットは、実際の作成や維持には高額なコストと多大な労力がかかるため、品質を保証することができません。
SFT データセットの品質に関連する制限を克服するために、新しい \textbf{p}reference-\textbf{o}riented supervised \textbf{f}ine-\textbf{t}uning アプローチ、つまり PoFT を導入します。
直感的には、\textit{同じ SFT データ上の整列された LLM よりもターゲットモデルを優先するという特定の優先順位を課すことによって SFT を向上させることができます。この優先順位により、ターゲットモデルは、評価を組み込んで、整列された LLM によって予測される可能性よりも高い尤度を予測するようになります。
データ品質に関する情報 (つまり、調整された LLM によって予測される可能性) をトレーニングプロセスに組み込みます。
広範な実験が実施され、その結果によって提案された方法の有効性が検証されました。
PoFT は、さまざまなトレーニングデータセットおよびベースモデルにわたって、SFT ベースラインを超える安定した一貫した改善を実現します。
さらに、PoFT を既存の SFT データフィルタリング手法と統合してパフォーマンスを向上させることができ、DPO などの優先最適化手順に従うことでさらに改善できることを証明します。

要約(オリジナル)

Alignment, endowing a pre-trained Large language model (LLM) with the ability to follow instructions, is crucial for its real-world applications. Conventional supervised fine-tuning (SFT) methods formalize it as causal language modeling typically with a cross-entropy objective, requiring a large amount of high-quality instruction-response pairs. However, the quality of widely used SFT datasets can not be guaranteed due to the high cost and intensive labor for the creation and maintenance in practice. To overcome the limitations associated with the quality of SFT datasets, we introduce a novel \textbf{p}reference-\textbf{o}riented supervised \textbf{f}ine-\textbf{t}uning approach, namely PoFT. The intuition is to boost SFT by imposing a particular preference: \textit{favoring the target model over aligned LLMs on the same SFT data.} This preference encourages the target model to predict a higher likelihood than that predicted by the aligned LLMs, incorporating assessment information on data quality (i.e., predicted likelihood by the aligned LLMs) into the training process. Extensive experiments are conducted, and the results validate the effectiveness of the proposed method. PoFT achieves stable and consistent improvements over the SFT baselines across different training datasets and base models. Moreover, we prove that PoFT can be integrated with existing SFT data filtering methods to achieve better performance, and further improved by following preference optimization procedures, such as DPO.

arxiv情報

著者	Yuchen Fan,Yuzhong Hong,Qiushi Wang,Junwei Bao,Hongfei Jiang,Yang Song
発行日	2024-12-17 12:49:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー