Memory-Efficient LLM Training with Online Subspace Descent

要約

最近、メモリ効率の高いさまざまな LLM トレーニングアルゴリズムがかなりの人気を得ています。
これらの方法では、勾配の低ランク構造を利用して、特異値分解 (SVD) によって見つかった射影行列を使用してオプティマイザーの状態を部分空間に射影します。
ただし、これらのアルゴリズムの収束は、射影行列の更新ルールに大きく依存します。
この研究では、射影行列の任意の更新ルールに対して \emph{first} の収束保証を提供します。
この保証は通常、LION、Adam などの最も一般的なオプティマイザーを含む、ハミルトニアン降下法で分析できるオプティマイザーに適用されます。
私たちの理論的理解に触発されて、SVD を使用しない新しい亜空間降下オプティマイザーファミリである Online Subspace Descent を提案します。
Online Subspace Descent は、固有ベクトルを使用して射影行列を更新する代わりに、オンライン PCA を使用して射影行列を更新します。
オンライン亜空間降下は柔軟性があり、トレーニングに最小限のオーバーヘッドしかかかりません。
C4 データセット上で 60M から 7B パラメーターの範囲の LLaMA モデルを事前トレーニングするタスクでは、オンライン部分空間降下は、さまざまな設定と範囲にわたって最先端の低ランクトレーニング手法よりも混乱が少なく、下流タスクのパフォーマンスが向上することを示します。
フルランクのベースラインとの差。

要約(オリジナル)

Recently, a wide range of memory-efficient LLM training algorithms have gained substantial popularity. These methods leverage the low-rank structure of gradients to project optimizer states into a subspace using projection matrix found by singular value decomposition (SVD). However, convergence of these algorithms is highly dependent on the update rules of their projection matrix. In this work, we provide the \emph{first} convergence guarantee for arbitrary update rules of projection matrix. This guarantee is generally applicable to optimizers that can be analyzed with Hamiltonian Descent, including most common ones, such as LION, Adam. Inspired by our theoretical understanding, we propose Online Subspace Descent, a new family of subspace descent optimizer without SVD. Instead of updating the projection matrix with eigenvectors, Online Subspace Descent updates the projection matrix with online PCA. Online Subspace Descent is flexible and introduces only minimum overhead to training. We show that for the task of pretraining LLaMA models ranging from 60M to 7B parameters on the C4 dataset, Online Subspace Descent achieves lower perplexity and better downstream tasks performance than state-of-the-art low-rank training methods across different settings and narrows the gap with full-rank baselines.

arxiv情報

著者	Kaizhao Liang,Bo Liu,Lizhang Chen,Qiang Liu
発行日	2024-08-23 05:54:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Memory-Efficient LLM Training with Online Subspace Descent

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー