Weak-to-Strong Extrapolation Expedites Alignment

要約

大規模言語モデル (LLM) の機能は、理想的にはデータとコンピューティングの増加に応じてスケールアップしますが、実際には限られたリソースによって必然的に制約を受けます。
適度に訓練された LLM (たとえば、人間の好みに合わせて訓練された) を手元に持っていると仮定すると、その可能性をさらに活用して、より強力なモデルを安価に取得できるでしょうか?
この論文では、人間の好みへの LLM の整合性を高めるための ExPO と呼ばれる簡単な方法を提案します。
ExPO は、中程度の位置合わせのモデルが、初期の SFT モデルなどの位置合わせの低い (弱い) モデルと、よりよく位置合わせされた (より強力な) モデルとの間で補間できると想定しています。
前の 2 つのモデルは比較的弱いです。
AlpacaEval 2.0 ベンチマークでは、ExPO が、より少ない選好データ (例: 10% または 20%) でトレーニングされたモデルを、追加のトレーニングなしで、完全にトレーニングされたモデルに到達し、さらにはそれを超えるようにプッシュすることを示します。
さらに、ExPO は既製の DPO/RLHF モデルも大幅に改善し、7B から 70B までのモデルサイズにわたって適切な拡張性を示します。
私たちの研究は、LLM の機能を活用する際のモデル外挿の有効性を実証し、将来の探求に値する有望な方向性を示唆しています。

要約(オリジナル)

Although the capabilities of large language models (LLMs) ideally scale up with increasing data and compute, they are inevitably constrained by limited resources in reality. Suppose we have a moderately trained LLM (e.g., trained to align with human preference) in hand, can we further exploit its potential and cheaply acquire a stronger model? In this paper, we propose a simple method called ExPO to boost LLMs’ alignment with human preference. ExPO assumes that a medium-aligned model can be interpolated between a less-aligned (weaker) model, e.g., the initial SFT model, and a better-aligned (stronger) one, thereby directly obtaining this stronger model by extrapolating from the weights of the former two relatively weaker models. On the AlpacaEval 2.0 benchmark, we show that ExPO pushes models trained with less preference data (e.g., 10% or 20%) to reach and even surpass the fully-trained one, without any additional training. Furthermore, ExPO also significantly improves off-the-shelf DPO/RLHF models and exhibits decent scalability across model sizes from 7B to 70B. Our work demonstrates the efficacy of model extrapolation in exploiting LLMs’ capabilities, suggesting a promising direction that deserves future exploration.

arxiv情報

著者	Chujie Zheng,Ziqi Wang,Heng Ji,Minlie Huang,Nanyun Peng
発行日	2024-04-25 17:39:50+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Weak-to-Strong Extrapolation Expedites Alignment

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー