Practical offloading for fine-tuning LLM on commodity GPU via learned subspace projectors

要約

大規模言語モデル (LLM) を微調整するには大量のメモリが必要となり、多くの場合、単一の GPU の容量を超えます。
このメモリの課題に対する一般的な解決策は、コンピューティングとデータを GPU から CPU にオフロードすることです。
ただし、このアプローチは、CPU と GPU 間の通信を制限する汎用ハードウェアの帯域幅の制限によって妨げられます。
この論文では、学習された部分空間プロジェクターを通じて、汎用ハードウェア上でネイティブに近い速度の LLM 微調整を可能にするオフロードフレームワーク LSP_Offload を紹介します。
私たちのデータ駆動型のアプローチには、精度の損失を最小限に抑えて通信を最小限に抑える効率的なスパースコンプレッサーの学習が含まれます。
さらに、通信と計算の間の並列性を最大化するために、新しい層ごとの通信スケジュールを導入します。
その結果、私たちのフレームワークは、4 GB のラップトップ GPU で 13 億のパラメーターモデルを微調整でき、24 GB メモリを搭載した NVIDIA RTX 4090 GPU で 70 億のパラメーターモデルを微調整でき、無制限のメモリで微調整する場合と比較して、速度の低下は 31% のみにとどまります。
。
最先端のオフロードフレームワークと比較して、私たちのアプローチは微調整スループットを最大 3.33 倍向上させ、同じ精度に収束した場合のエンドツーエンドの微調整時間を 33.1% ～ 62.5% 短縮します。

要約(オリジナル)

Fine-tuning large language models (LLMs) requires significant memory, often exceeding the capacity of a single GPU. A common solution to this memory challenge is offloading compute and data from the GPU to the CPU. However, this approach is hampered by the limited bandwidth of commodity hardware, which constrains communication between the CPU and GPU. In this paper, we present an offloading framework, LSP_Offload, that enables near-native speed LLM fine-tuning on commodity hardware through learned subspace projectors. Our data-driven approach involves learning an efficient sparse compressor that minimizes communication with minimal precision loss. Additionally, we introduce a novel layer-wise communication schedule to maximize parallelism between communication and computation. As a result, our framework can fine-tune a 1.3 billion parameter model on a 4GB laptop GPU and a 7 billion parameter model on an NVIDIA RTX 4090 GPU with 24GB memory, achieving only a 31% slowdown compared to fine-tuning with unlimited memory. Compared to state-of-the-art offloading frameworks, our approach increases fine-tuning throughput by up to 3.33 times and reduces end-to-end fine-tuning time by 33.1%~62.5% when converging to the same accuracy.

arxiv情報

著者	Siyuan Chen,Zelong Guan,Yudong Liu,Phillip B. Gibbons
発行日	2024-06-14 16:59:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Practical offloading for fine-tuning LLM on commodity GPU via learned subspace projectors

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー