LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging

要約

大規模な事前トレーニング済みモデルは、さまざまなタスクにわたって優れたゼロショットパフォーマンスを示しますが、微調整は多くの場合壊滅的な忘却につながり、ターゲットドメインの改善によって他のタスクの一般化が低下します。
この課題に対処するために、微調整されたタスクのパフォーマンスを向上させながら、事前トレーニングされた一般化を維持するように設計されたトレーニング後の編集手法である LiNeS (Layer-increeasing Network Scaling) を導入します。
LiNeS は、ネットワーク内の層の深さに基づいてパラメータの更新を線形にスケーリングし、浅い層を事前トレーニングされた値に近づけて一般的な特徴を維持しながら、より深い層がタスク固有の表現を保持できるようにします。
このアプローチをマルチタスクモデルのマージシナリオにさらに拡張し、マージされたパラメーターのレイヤーごとのスケーリングによってマイナスのタスク干渉が軽減されます。
LiNeS は、ビジョンおよび自然言語処理のさまざまなベンチマークにわたって、シングルタスク設定とマルチタスク設定の両方で大幅な改善を示しています。
忘却を軽減し、分布外の一般化を強化し、ベースラインをマージする既存のマルチタスクモデルとシームレスに統合して、ベンチマークやモデルサイズ全体でパフォーマンスを向上させます。また、RLHF を介してさまざまな報酬に合わせた LLM ポリシーをマージするときに一般化を促進できます。
重要なのは、私たちの方法は実装が簡単で、多くの既存の技術を補完するものです。

要約(オリジナル)

Large pre-trained models exhibit impressive zero-shot performance across diverse tasks, but fine-tuning often leads to catastrophic forgetting, where improvements on a target domain degrade generalization on other tasks. To address this challenge, we introduce LiNeS, Layer-increasing Network Scaling, a post-training editing technique designed to preserve pre-trained generalization while enhancing fine-tuned task performance. LiNeS scales parameter updates linearly based on their layer depth within the network, maintaining shallow layers close to their pre-trained values to preserve general features while allowing deeper layers to retain task-specific representations. We further extend this approach to multi-task model merging scenarios, where layer-wise scaling of merged parameters reduces negative task interference. LiNeS demonstrates significant improvements in both single-task and multi-task settings across various benchmarks in vision and natural language processing. It mitigates forgetting, enhances out-of-distribution generalization, integrates seamlessly with existing multi-task model merging baselines improving their performance across benchmarks and model sizes, and can boost generalization when merging LLM policies aligned with different rewards via RLHF. Importantly, our method is simple to implement and complementary to many existing techniques.

arxiv情報

著者	Ke Wang,Nikolaos Dimitriadis,Alessandro Favero,Guillermo Ortiz-Jimenez,Francois Fleuret,Pascal Frossard
発行日	2024-10-22 16:26:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー