SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models

要約

広範なデータセットでのディープモデルトレーニングのコストはますます高額になり、既存のモデルからの知識を活用するディープモデルフュージョン技術の普及が促進されています。
単純な重み平均から AdaMerging のようなより高度な手法まで、モデル融合によりモデルのパフォーマンスが効果的に向上し、新しいモデルの開発が加速されます。
ただし、個々のモデルのパラメーター間の潜在的な干渉と、融合の進行における解釈可能性の欠如は依然として大きな課題です。
既存の方法では、パラメータの大きさや符号などのパラメータの属性を評価したり、パラメータを枝刈りしたりすることによって、パラメータ干渉の問題を解決しようとすることがよくあります。
この研究では、部分空間解析のレンズを通して線形層の微調整を調べることから始め、この主題に光を当てるための最適化問題としてパラメータ干渉を明示的に定義します。
続いて、ゼロショット Sparse MIxture of Low-rank Experts (SMILE) 構築と呼ばれるモデル融合への革新的なアプローチを導入します。これにより、追加のデータやさらなるトレーニングなしでソースモデルを MoE モデルにアップスケーリングできます。
私たちのアプローチは、微調整によって重要な部分はほとんどが事前トレーニングから保持されますが、重要性の低い領域または未使用の領域が新しいタスクに適応するために使用されるという観察に基づいています。
また、元のパラメータ空間では本質的に扱いにくいパラメータ干渉の問題も、次元を拡張することで管理できます。
私たちは、完全な微調整と LoRA 微調整を使用して、画像分類やテキスト一般化タスクなどのさまざまなシナリオにわたって広範な実験を実施し、その手法を大規模な言語モデル (CLIP モデル、Flan-T5 モデル、Mistral-7B) に適用します。
モデル)、SMILE の適応性と拡張性を強調しています。
コードは https://github.com/tanganke/fusion_bench で入手できます。

要約(オリジナル)

Deep model training on extensive datasets is increasingly becoming cost-prohibitive, prompting the widespread adoption of deep model fusion techniques to leverage knowledge from pre-existing models. From simple weight averaging to more sophisticated methods like AdaMerging, model fusion effectively improves model performance and accelerates the development of new models. However, potential interference between parameters of individual models and the lack of interpretability in the fusion progress remain significant challenges. Existing methods often try to resolve the parameter interference issue by evaluating attributes of parameters, such as their magnitude or sign, or by parameter pruning. In this study, we begin by examining the fine-tuning of linear layers through the lens of subspace analysis and explicitly define parameter interference as an optimization problem to shed light on this subject. Subsequently, we introduce an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction, which allows for the upscaling of source models into an MoE model without extra data or further training. Our approach relies on the observation that fine-tuning mostly keeps the important parts from the pre-training, but it uses less significant or unused areas to adapt to new tasks. Also, the issue of parameter interference, which is intrinsically intractable in the original parameter space, can be managed by expanding the dimensions. We conduct extensive experiments across diverse scenarios, such as image classification and text generalization tasks, using full fine-tuning and LoRA fine-tuning, and we apply our method to large language models (CLIP models, Flan-T5 models, and Mistral-7B models), highlighting the adaptability and scalability of SMILE. Code is available at https://github.com/tanganke/fusion_bench

arxiv情報

著者	Anke Tang,Li Shen,Yong Luo,Shuai Xie,Han Hu,Lefei Zhang,Bo Du,Dacheng Tao
発行日	2024-08-19 17:32:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー