ATM: Improving Model Merging by Alternating Tuning and Merging

要約

モデルのマージは、マルチタスク学習のコスト効率の高いパラダイムとして最近登場しました。
現在のアプローチの中でも、タスク算術はその単純さと有効性で際立っています。
この論文では、タスクベクトルをマルチタスクの勾配にリンクすることで、タスクベクトルの有効性を高めます。
シングルエポックシナリオでは、タスクベクトルはマルチタスク設定での勾配降下法によって得られた勾配と数学的に等価であり、後続のエポックでもこれらの勾配を近似することを示します。
さらに、タスクベクトルは平等が維持されている場合に最適に実行され、その有効性は主に最初のエポックの勾配によって左右されることを示します。
この洞察に基づいて、モデルのマージを、チューニングとマージ (ATM) を交互に繰り返す反復プロセスの 1 つのステップとして見ることを提案します。
この方法は、モデルのマージとマルチタスク勾配降下法の橋渡しとして機能し、同じデータと計算要件で最先端の結果を達成します。
当社は、さまざまな設定にわたって ATM を広範囲に評価し、最良のベースラインと比較して、コンピュータービジョンおよび NLP タスクで最大 20% 高い精度を達成しています。
最後に、タスクベクトル間の直交性の向上を実証し、ATM がすべてのタスクを共同で微調整することで得られる損失の上限を最小限に抑えることを証明して、その有効性について経験的および理論的サポートを提供します。

要約(オリジナル)

Model merging has recently emerged as a cost-efficient paradigm for multi-task learning. Among current approaches, task arithmetic stands out for its simplicity and effectiveness. In this paper, we motivate the effectiveness of task vectors by linking them to multi-task gradients. We show that in a single-epoch scenario, task vectors are mathematically equivalent to the gradients obtained via gradient descent in a multi-task setting, and still approximate these gradients in subsequent epochs. Furthermore, we show that task vectors perform optimally when equality is maintained, and their effectiveness is largely driven by the first epoch’s gradient. Building on this insight, we propose viewing model merging as a single step in an iterative process that Alternates between Tuning and Merging (ATM). This method acts as a bridge between model merging and multi-task gradient descent, achieving state-of-the-art results with the same data and computational requirements. We extensively evaluate ATM across diverse settings, achieving up to 20% higher accuracy in computer vision and NLP tasks, compared to the best baselines. Finally, we provide both empirical and theoretical support for its effectiveness, demonstrating increased orthogonality between task vectors and proving that ATM minimizes an upper bound on the loss obtained by jointly finetuning all tasks.

arxiv情報

著者	Luca Zhou,Daniele Solombrino,Donato Crisostomi,Maria Sofia Bucarelli,Fabrizio Silvestri,Emanuele Rodolà
発行日	2024-11-06 13:24:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

ATM: Improving Model Merging by Alternating Tuning and Merging

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー