MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

要約

モデルのマージは、同じ事前トレーニング済みモデルから微調整された複数のシングルタスクモデルをマルチタスクモデルに結合する効果的なアプローチとして浮上しました。
このプロセスには通常、追加のトレーニングを行わずにモデルパラメーターの加重平均を計算することが含まれます。
既存のモデル結合手法は、タスクの平均精度を向上させることに重点を置いています。
ただし、異なるタスクの目的間の干渉や競合により、モデルの結合中にトレードオフが発生する可能性があります。
実際のアプリケーションでは、さまざまなトレードオフを伴う一連のソリューションの方が有益な情報が得られるため、実務者がさまざまな好みに基づいて意思決定を行うのに役立ちます。
このペーパーでは、新しい低コンピューティングアルゴリズムである、償却パレートフロント (MAP) とのモデルマージを紹介します。
MAP は、トレードオフを反映するために複数のモデルをマージするためのスケーリング係数のパレートセットを識別します。
MAP のコアコンポーネントは、事前に選択されたスケーリング係数のセットから導出された 2 次近似代理モデルを使用してさまざまなタスクの評価メトリクスを近似し、償却推論を可能にします。
視覚および自然言語処理タスクに関する実験結果は、MAP がパレートフロントを正確に識別できることを示しています。
MAP に必要な計算をさらに削減するために、(1) ベイジアン適応サンプリングアルゴリズムと (2) 複数のステージを備えたネストされたマージスキームを提案します。

要約(オリジナル)

Model merging has emerged as an effective approach to combine multiple single-task models, fine-tuned from the same pre-trained model, into a multitask model. This process typically involves computing a weighted average of the model parameters without any additional training. Existing model-merging methods focus on enhancing average task accuracy. However, interference and conflicts between the objectives of different tasks can lead to trade-offs during model merging. In real-world applications, a set of solutions with various trade-offs can be more informative, helping practitioners make decisions based on diverse preferences. In this paper, we introduce a novel low-compute algorithm, Model Merging with Amortized Pareto Front (MAP). MAP identifies a Pareto set of scaling coefficients for merging multiple models to reflect the trade-offs. The core component of MAP is approximating the evaluation metrics of the various tasks using a quadratic approximation surrogate model derived from a pre-selected set of scaling coefficients, enabling amortized inference. Experimental results on vision and natural language processing tasks show that MAP can accurately identify the Pareto front. To further reduce the required computation of MAP, we propose (1) a Bayesian adaptive sampling algorithm and (2) a nested merging scheme with multiple stages.

arxiv情報

著者	Lu Li,Tianyu Zhang,Zhiqi Bu,Suyuchen Wang,Huan He,Jie Fu,Yonghui Wu,Jiang Bian,Yong Chen,Yoshua Bengio
発行日	2024-06-11 17:55:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー