Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

要約

大規模で深いニューラルネットワークを収束までトレーニングすると、法外なコストがかかる可能性があります。
その結果、多くの場合、人気のある高密度モデルの少数の選択のみが、さまざまなコンテキストやタスクで再利用されます。
モデルのサイズを計算コストから切り離そうとする疎にアクティブ化されたモデルは、密集したモデルの魅力的な代替手段になりつつあります。
スパースモデルは、品質と計算コストの点でより効率的ですが、大規模な体制でゼロからトレーニングするには依然としてデータを大量に消費し、コストがかかります。
この作業では、スパースアップサイクルを提案します。これは、密なチェックポイントから疎にアクティブ化された専門家混合モデルを初期化することにより、サンクトレーニングコストを再利用する簡単な方法です。
まばらにアップサイクルされた T5 Base、Large、および XL 言語モデルと、Vision Transformer Base および Large モデルはそれぞれ、SuperGLUE と ImageNet での高密度対応モデルよりも大幅に優れており、初期の高密度事前トレーニングサンクコストの約 50% しか使用していません。
アップサイクルされたモデルは、最初の密な事前トレーニングの計算予算の 100% でゼロからトレーニングされたスパースモデルよりも優れています。

要約(オリジナル)

Training large, deep neural networks to convergence can be prohibitively expensive. As a result, often only a small selection of popular, dense models are reused across different contexts and tasks. Increasingly, sparsely activated models, which seek to decouple model size from computation costs, are becoming an attractive alternative to dense models. Although more efficient in terms of quality and computation cost, sparse models remain data-hungry and costly to train from scratch in the large scale regime. In this work, we propose sparse upcycling — a simple way to reuse sunk training costs by initializing a sparsely activated Mixture-of-Experts model from a dense checkpoint. We show that sparsely upcycled T5 Base, Large, and XL language models and Vision Transformer Base and Large models, respectively, significantly outperform their dense counterparts on SuperGLUE and ImageNet, using only ~50% of the initial dense pretraining sunk cost. The upcycled models also outperform sparse models trained from scratch on 100% of the initial dense pretraining computation budget.

arxiv情報

著者	Aran Komatsuzaki,Joan Puigcerver,James Lee-Thorp,Carlos Riquelme Ruiz,Basil Mustafa,Joshua Ainslie,Yi Tay,Mostafa Dehghani,Neil Houlsby
発行日	2023-02-17 17:54:50+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー