A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs

要約

幅では剪定と比較して、深さごとの剪定は、リソース制約のシナリオで推論を大幅に加速する可能性があります。
Howerverは、トランス層全体を最小剪定ユニットとして扱うことで、層の情報全体を無差別に破棄することにより、モデルのパフォーマンスを低下させる可能性があります。
このホワイトペーパーでは、再生カーネルヒルベルトスペースの異なるレイヤーの出力の相関を分析することにより、大きな言語モデルのレイヤー間の「パッチのような」特徴関係を明らかにします。
この観察結果に基づいて、事前に定義された類似性のしきい値に従って上から下まで連続したレイヤーを動的に選択および融合するスライド層マージメソッドを提案し、それによってパフォーマンスを維持しながらモデル構造を簡素化します。
さまざまなアーキテクチャとさまざまなパラメータースケールを使用したLLMSに関する広範な実験は、剪定後のゼロショット推論のパフォーマンスと再訓練回復品質の両方で既存の剪定技術を上回ることを示しています。
特に、Vicuna-7Bモデルで35 \％Pruningを使用した実験では、既存の方法と比較して、ゼロショットタスクの平均パフォーマンスが1.654 \％の改善を達成しました。
さらに、深さの剪定と幅剪定を組み合わせて剪定効果を高める可能性をさらに明らかにします。
私たちのコードは、https：//github.com/920927/slm-a-sliding-layer-merging-methodで入手できます。

要約(オリジナル)

Compared to width-wise pruning, depth-wise pruning can significantly accelerate inference in resource-constrained scenarios. Howerver, treating the entire Transformer layer as the minimum pruning unit may degrade model performance by indiscriminately discarding the entire information of the layer. This paper reveals the ‘Patch-like’ feature relationship between layers in large language models by analyzing the correlation of the outputs of different layers in the reproducing kernel Hilbert space. Building on this observation, we proposes a sliding layer merging method that dynamically selects and fuses consecutive layers from top to bottom according to a pre-defined similarity threshold, thereby simplifying the model structure while maintaining its performance. Extensive experiments on LLMs with various architectures and different parameter scales show that our method outperforms existing pruning techniques in both zero-shot inference performance and retraining recovery quality after pruning. In particular, in the experiment with 35\% pruning on the Vicuna-7B model, our method achieved a 1.654\% improvement in average performance on zero-shot tasks compared to the existing method. Moreover, we further reveal the potential of combining depth pruning with width pruning to enhance the pruning effect. Our codes are available at https://github.com/920927/SLM-a-sliding-layer-merging-method.

arxiv情報

著者	Xuan Ding,Yao Zhu,Yunjian Zhang,Chuanlong Xie
発行日	2025-02-26 14:15:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー