要約
【タイトル】RAFT:Generative Foundation Model Alignmentのための報酬ランキングFineTuning
【要約】
– 生成的な基礎モデルは、広範囲な非監視学習データから生じる暗黙的なバイアスに影響を受けやすい。
– このようなバイアスは、サンプルの品質の低下、偏った結果、不公平性をもたらし、重大な影響を及ぼす可能性がある。
– RAFTは、報酬モデルと十分な数のサンプルを使用して、高品質のサンプルを選択し、望ましくない振る舞いを示すサンプルを破棄し、ストリーミングデータセットを組み立てることで、生成モデルを整列させるためのフレームワーク。
– RAFTのサンプル生成プロセスは勾配フリーであり、ブラックボックスジェネレータと互換性がある。
– 多数の実験により、RAFTアルゴリズムが大規模言語モデルと拡散モデルの両方において強力なパフォーマンスを示すことを実証している。
要約(オリジナル)
Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data. Such biases can produce suboptimal samples, skewed outcomes, and unfairness, with potentially significant repercussions. Consequently, aligning these models with human ethics and preferences is an essential step toward ensuring their responsible and effective deployment in real-world applications. Prior research has primarily employed Reinforcement Learning from Human Feedback (RLHF) as a means of addressing this problem, wherein generative models are fine-tuned using RL algorithms guided by a human-feedback-informed reward model. However, the inefficiencies and instabilities associated with RL algorithms frequently present substantial obstacles to the successful alignment of generative models, necessitating the development of a more robust and streamlined approach. To this end, we introduce a new framework, Reward rAnked FineTuning (RAFT), designed to align generative models more effectively. Utilizing a reward model and a sufficient number of samples, our approach selects the high-quality samples, discarding those that exhibit undesired behavior, and subsequently assembles a streaming dataset. This dataset serves as the basis for aligning the generative model and can be employed under both offline and online settings. Notably, the sample generation process within RAFT is gradient-free, rendering it compatible with black-box generators. Through extensive experiments, we demonstrate that our proposed algorithm exhibits strong performance in the context of both large language models and diffusion models.
arxiv情報
著者 | Hanze Dong,Wei Xiong,Deepanshu Goyal,Rui Pan,Shizhe Diao,Jipeng Zhang,Kashun Shum,Tong Zhang |
発行日 | 2023-04-13 18:22:40+00:00 |
arxivサイト | arxiv_id(pdf) |
提供元, 利用サービス
arxiv.jp, OpenAI