Single Parent Family: A Spectrum of Family Members from a Single Pre-Trained Foundation Model

要約

この論文では、大規模な言語モデルの圧縮に合わせたプログレッシブ低ランク分解 (PLRD) の新しい方法を紹介します。
私たちのアプローチでは、事前トレーニングされたモデルを活用し、その後、段階的に低いランクを使用して、より小さなサイズに段階的に解凍されます。
この方法では、最初から再トレーニングする必要がなく、後続のモデルが元のモデルから派生するため、計算オーバーヘッドとエネルギー消費を大幅に削減できます。
戦略的にテンソルランクを減らし、モデルのパフォーマンスとリソース使用量の間のトレードオフを最適化する PLRD の実装について詳しく説明します。
PLRD の有効性は、1B トークンのみで PLRD メソッドでトレーニングされたモデルが、トークンの 0.1% を使用しながら従来のトレーニングされたモデルと同等のパフォーマンスを維持することを示す広範な実験を通じて実証されています。
PLRD の多用途性は、単一の基本モデルから複数のモデルサイズを生成し、さまざまな計算量とメモリの予算に流動的に適応する機能によって強調されます。
私たちの調査結果は、PLRD が LLM の効率的なスケーリングに関する新しい標準を設定し、高度な AI を多様なプラットフォームでより実現可能にする可能性があることを示唆しています。

要約(オリジナル)

This paper introduces a novel method of Progressive Low Rank Decomposition (PLRD) tailored for the compression of large language models. Our approach leverages a pre-trained model, which is then incrementally decompressed to smaller sizes using progressively lower ranks. This method allows for significant reductions in computational overhead and energy consumption, as subsequent models are derived from the original without the need for retraining from scratch. We detail the implementation of PLRD, which strategically decreases the tensor ranks, thus optimizing the trade-off between model performance and resource usage. The efficacy of PLRD is demonstrated through extensive experiments showing that models trained with PLRD method on only 1B tokens maintain comparable performance with traditionally trained models while using 0.1% of the tokens. The versatility of PLRD is highlighted by its ability to generate multiple model sizes from a single foundational model, adapting fluidly to varying computational and memory budgets. Our findings suggest that PLRD could set a new standard for the efficient scaling of LLMs, making advanced AI more feasible on diverse platforms.

arxiv情報

著者	Habib Hajimolahoseini,Mohammad Hassanpour,Foozhan Ataiefard,Boxing Chen,Yang Liu
発行日	2024-06-28 15:27:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Single Parent Family: A Spectrum of Family Members from a Single Pre-Trained Foundation Model

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー