Transferable Post-training via Inverse Value Learning

要約

トレーニング後のプロセスで利用されるデータセットがますます大規模になり、基本モデルのサイズが増大し続けるにつれて、既存のアルゴリズムの計算需要と実装の課題が大幅に増大しています。
この論文では、別のニューラルネットワーク (つまり、バリューネットワーク) を使用して、トレーニング後の変化をロジットレベルでモデル化することを提案します。
デモンストレーションを使用して小規模な基本モデルでこのネットワークをトレーニングした後、このネットワークは推論中に他の事前トレーニング済みモデルとシームレスに統合でき、同様の機能強化を実現できます。
私たちは、事前トレーニングの重みと接続スキームの観点から、このパラダイムのベストプラクティスを体系的に調査します。
結果として得られる値ネットワークは、同じファミリー内の異なるパラメーターサイズの事前トレーニング済みモデル、同じファミリー内で継続的な事前トレーニングを受けているモデル、およびファミリー間で異なる語彙を持つモデル間で広範な移行可能性があることを実証します。
場合によっては、フルパラメータ微調整に匹敵するパフォーマンスを達成できます。
さらに、価値モデルの移転可能性を高め、トレーニング中に使用されるベースモデルへの過剰適合を防ぐ方法を探索します。

要約(オリジナル)

As post-training processes utilize increasingly large datasets and base models continue to grow in size, the computational demands and implementation challenges of existing algorithms are escalating significantly. In this paper, we propose modeling the changes at the logits level during post-training using a separate neural network (i.e., the value network). After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference, enables them to achieve similar capability enhancements. We systematically investigate the best practices for this paradigm in terms of pre-training weights and connection schemes. We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes within the same family, models undergoing continuous pre-training within the same family, and models with different vocabularies across families. In certain cases, it can achieve performance comparable to full-parameter fine-tuning. Furthermore, we explore methods to enhance the transferability of the value model and prevent overfitting to the base model used during training.

arxiv情報

著者	Xinyu Lu,Xueru Wen,Yaojie Lu,Bowen Yu,Hongyu Lin,Haiyang Yu,Le Sun,Xianpei Han,Yongbin Li
発行日	2024-10-28 13:48:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Transferable Post-training via Inverse Value Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー