PERP: Rethinking the Prune-Retrain Paradigm in the Era of LLMs

要約

ニューラルネットワークは、剪定により効果的に圧縮され、予測パフォーマンスを維持しながら、ストレージと需要の計算を大幅に削減できます。
マグニチュードプルーニングなどのシンプルで効果的な方法は、それほど重要ではないパラメーターを削除し、通常、パフォーマンスを回復するために費用のかかる再トレーニング手順が必要です。
ただし、LLMSの上昇により、メモリと計算の制約により、完全な再訓練が実行不可能になりました。
この研究では、非常に表現力豊かなパラメーターの小さなサブセットを更新することで、剪定後のパフォーマンスを回復または強化するのに十分であることを示すことにより、すべてのパラメーターを再訓練する慣行に挑戦します。
驚くべきことに、GPT-Architecturesのパラメーターのわずか0.01％-0.05％が、さまざまなスパースレベルにわたって完全な再訓練のパフォーマンスと一致し、計算とメモリの要件を大幅に削減し、単一のGPUで最大300億パラメーターのモデルの再訓練を可能にすることができます。
数分で。
高いスパース領域での完全な再訓練へのギャップを埋めるために、標準のLORAとは異なり、スパースを損なうことなくアダプターをマージすることを可能にする2つの新しいLoraバリアントを導入します。
さらに一歩進むと、これらの方法は、メモリ効率の高いレイヤーごとの再構成に適用できることを示し、ワンダ（Sun et al。、2023）やSparsegpt（Frantar＆＆
Alistarh、2023）。
私たちの調査結果は、再訓練を避けるための有望な代替手段を提示します。

要約(オリジナル)

Neural Networks can be effectively compressed through pruning, significantly reducing storage and compute demands while maintaining predictive performance. Simple yet effective methods like magnitude pruning remove less important parameters and typically require a costly retraining procedure to restore performance. However, with the rise of LLMs, full retraining has become infeasible due to memory and compute constraints. This study challenges the practice of retraining all parameters by showing that updating a small subset of highly expressive parameters can suffice to recover or even enhance performance after pruning. Surprisingly, retraining just 0.01%-0.05% of the parameters in GPT-architectures can match the performance of full retraining across various sparsity levels, significantly reducing compute and memory requirements, and enabling retraining of models with up to 30 billion parameters on a single GPU in minutes. To bridge the gap to full retraining in the high sparsity regime, we introduce two novel LoRA variants that, unlike standard LoRA, allow merging adapters back without compromising sparsity. Going a step further, we show that these methods can be applied for memory-efficient layer-wise reconstruction, significantly enhancing state-of-the-art retraining-free methods like Wanda (Sun et al., 2023) and SparseGPT (Frantar & Alistarh, 2023). Our findings present a promising alternative to avoiding retraining.

arxiv情報

著者	Max Zimmer,Megi Andoni,Christoph Spiegel,Sebastian Pokutta
発行日	2025-02-05 15:10:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PERP: Rethinking the Prune-Retrain Paradigm in the Era of LLMs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー