CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation

要約

フルサイズのMLPと注目の投影層は、大規模な言語モデル（LLM）の途方もないモデルサイズを導入し、トレーニング前の段階で計算リソースの非常に厳しいニーズを課します。
ただし、事前に訓練されたLLMの活性化が低ランク特性を示すことを経験的に観察します。
このような観察に動機付けられたColaとそのメモリ効率の高い実装であるCola-Mは、これらのフルサイズの層を、トレーニング全体で自然に低ランクの活性化を強制する計算効率の高い自動エンコーダーに置き換えることを提案します。
この基本的なアーキテクチャの変化は、活性化の冗長性を排除し、モデルの容量とトレーニング効率を大幅に向上させます。
6,000万から70億パラメーターのLlamaモデルでの実験は、COLAがコンピューティングコストを$ \ BF 2 \ PMB {\ Times} $削減し、フルランクレベルのパフォーマンスを維持しながら$ \ BF 1.86 \ PMB {\ Times} $のトレーニングスループットを改善することを示しています。
COLA-Mは、スループットを犠牲にすることなくメモリコストをさらに絞り、集合的に優れたパラメーター、コンピューティング、およびメモリ効率を備えたトレーニング前のアプローチを提供します。
生成されたLLMは、$ \ bf 2 \ pmb {\ times} $も小さく、リソース制約のプラットフォームでメモリコストを削減するより速い推論を可能にします。

要約(オリジナル)

The full-size MLPs and the projection layers in attention introduce tremendous model sizes of large language models (LLMs), imposing extremely demanding needs of computational resources in the pre-training stage. However, we empirically observe that the activations of pre-trained LLMs exhibit low-rank property. Motivated by such observations, we propose CoLA and its memory-efficient implementation, CoLA-M, to replace these full-size layers with compute-efficient auto-encoders that naturally enforce low-rank activations throughout training. This fundamental architectural change eliminates the activation redundancy and significantly boosts model capacity and training efficiency. Experiments on LLaMA models with 60 million to 7 billion parameters show that CoLA reduces the computing cost by $\bf 2\pmb{\times}$ and improves training throughput by $\bf 1.86\pmb{\times}$ while maintaining full-rank level performance. CoLA-M further squeezes memory cost without sacrificing throughput, offering a pre-training approach with collectively superior parameter, computing, and memory efficiency. The LLMs produced are also $\bf 2\pmb{\times}$ smaller, enabling faster inference with lower memory cost on resource-constrained platforms.

arxiv情報

著者	Ziyue Liu,Ruijie Zhang,Zhengyang Wang,Zi Yang,Paul Hovland,Bogdan Nicolae,Franck Cappello,Zheng Zhang
発行日	2025-05-20 16:27:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー