Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients

要約

大規模言語モデル (LLM) のトレーニングと微調整は、限られた GPU メモリによってボトルネックになることがよくあります。
既存の射影ベースの最適化手法は、勾配を低次元の部分空間に射影してオプティマイザーの状態メモリを削減することでこの問題に対処していますが、通常は高密度の射影行列に依存しているため、計算とメモリのオーバーヘッドが発生する可能性があります。
この研究では、スパース投影を活用して勾配を構造化されたスパース更新に変換する新しいアプローチである Grass (GRAdient Structured Sparsification) を提案します。
この設計により、オプティマイザー状態のメモリ使用量が大幅に削減されるだけでなく、勾配メモリのフットプリント、計算、通信コストも最小限に抑えられ、スループットの大幅な向上につながります。
事前トレーニングおよび微調整タスクに関する広範な実験により、Grass がフルランクのトレーニングや既存の予測ベースの手法に匹敵するパフォーマンスを達成できることが実証されました。
特に、Grass では、単一の 40GB A100 GPU で 13B パラメーターの LLaMA モデルの半精度事前トレーニングが可能になり、これまでの方法では不可能でした。8 GPU システムでは最大 2 倍のスループット向上が得られます。
コードは https://github.com/aashqmuhamed/GRASS にあります。

要約(オリジナル)

Large language model (LLM) training and finetuning are often bottlenecked by limited GPU memory. While existing projection-based optimization methods address this by projecting gradients into a lower-dimensional subspace to reduce optimizer state memory, they typically rely on dense projection matrices, which can introduce computational and memory overheads. In this work, we propose Grass (GRAdient Stuctured Sparsification), a novel approach that leverages sparse projections to transform gradients into structured sparse updates. This design not only significantly reduces memory usage for optimizer states but also minimizes gradient memory footprint, computation, and communication costs, leading to substantial throughput improvements. Extensive experiments on pretraining and finetuning tasks demonstrate that Grass achieves competitive performance to full-rank training and existing projection-based methods. Notably, Grass enables half-precision pretraining of a 13B parameter LLaMA model on a single 40GB A100 GPU–a feat infeasible for previous methods–and yields up to a $2\times$ throughput improvement on an 8-GPU system. Code can be found at https://github.com/aashiqmuhamed/GRASS .

arxiv情報

著者	Aashiq Muhamed,Oscar Li,David Woodruff,Mona Diab,Virginia Smith
発行日	2024-06-25 15:50:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー