Data-freeWeight Compress and Denoise for Large Language Models

要約

大規模言語モデル (LLM) は、特にモデルパラメーターが大幅にスケールアップするにつれて、人工知能の研究状況を再構築し、さまざまなドメインにわたって優れた機能を解き放ちます。
それにもかかわらず、モデルパラメーターのスケーラビリティは、GPU メモリと計算速度の制限による制約に直面しています。
これらの制約に対処するために、プルーニングや量子化など、さまざまな重み圧縮方法が登場しました。
言語モデルにおける重み行列の低ランクの性質を考慮すると、行列分解による重みの削減には、間違いなく大きな可能性と有望性が秘められています。
この論文では、LLM の固有の構造を利用して、パラメーター行列を圧縮するためのデータフリー結合ランク k 近似と呼ばれる新しいアプローチを提案します。
重要なことに、私たちの方法は、コーパスの追加の関与を必要とせず、同時に枝刈りおよび量子化方法と組み合わせて直交性を維持することを特徴としています。
キャリブレーションデータを使用せずに、元のパフォーマンスの 93.43% を維持しながら、80% のパラメーターのモデルプルーニングを実現します。
さらに、Rank-k 近似を受けた LLM の重み行列の基本特性を調査し、仮説を解明するために包括的な実験を実施します。

要約(オリジナル)

Large Language Models (LLMs) are reshaping the research landscape in artificial intelligence, particularly as model parameters scale up significantly, unlocking remarkable capabilities across various domains. Nevertheless, the scalability of model parameters faces constraints due to limitations in GPU memory and computational speed. To address these constraints, various weight compression methods have emerged, such as Pruning and Quantization. Given the low-rank nature of weight matrices in language models, the reduction of weights through matrix decomposition undoubtedly holds significant potential and promise. In this paper, drawing upon the intrinsic structure of LLMs, we propose a novel approach termed Data-free Joint Rank-k Approximation for compressing the parameter matrices. Significantly, our method is characterized by without necessitating additional involvement of any corpus, while simultaneously preserving orthogonality in conjunction with pruning and quantization methods. We achieve a model pruning of 80% parameters while retaining 93.43% of the original performance without any calibration data. Additionally, we explore the fundamental properties of the weight matrix of LLMs undergone Rank-k Approximation and conduct comprehensive experiments to elucidate our hypothesis.

arxiv情報

著者	Runyu Peng,Yunhua Zhou,Qipeng Guo,Yang Gao,Hang Yan,Xipeng Qiu,Dahua Lin
発行日	2024-02-26 05:51:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Data-freeWeight Compress and Denoise for Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー