A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs

要約

幅では剪定と比較して、深さごとの剪定は、リソース制約のシナリオで推論を大幅に加速する可能性があります。
ただし、トランス層全体を最小剪定ユニットとして扱うと、レイヤーの情報全体を無差別に破棄することにより、モデルのパフォーマンスを低下させる可能性があります。
このホワイトペーパーでは、繁殖するカーネルヒルベルトスペースの異なるレイヤーの出力の相関を分析することにより、大規模な言語モデルのレイヤー間の「パッチのような」特徴関係を明らかにします。
この観察結果に基づいて、事前に定義された類似性のしきい値に応じて上から下まで連続したレイヤーを動的に選択および融合するスライド層マージメソッドを提案し、それにより、パフォーマンスを維持しながらモデル構造を簡素化します。
さまざまなアーキテクチャとさまざまなパラメータースケールを使用したLLMSに関する広範な実験は、剪定後のゼロショット推論のパフォーマンスと再訓練回復品質の両方で既存の剪定技術を上回ることを示しています。
特に、Vicuna-7Bモデルでの35％の剪定を使用した実験では、既存の方法と比較して、ゼロショットタスクの平均パフォーマンスが1.654％改善されました。
さらに、深さの剪定と幅剪定を組み合わせて剪定効果を高める可能性をさらに明らかにします。
私たちのコードは、https：//github.com/920927/slm-a-sliding-layer-merging-methodで入手できます。

要約(オリジナル)

Compared to width-wise pruning, depth-wise pruning can significantly accelerate inference in resource-constrained scenarios. However, treating the entire Transformer layer as the minimum pruning unit may degrade model performance by indiscriminately discarding the entire information of the layer. This paper reveals the “Patch-like” feature relationship between layers in large language models by analyzing the correlation of the outputs of different layers in the reproducing kernel Hilbert space. Building on this observation, we propose a sliding layer merging method that dynamically selects and fuses consecutive layers from top to bottom according to a pre-defined similarity threshold, thereby simplifying the model structure while maintaining its performance. Extensive experiments on LLMs with various architectures and different parameter scales show that our method outperforms existing pruning techniques in both zero-shot inference performance and retraining recovery quality after pruning. In particular, in the experiment with 35% pruning on the Vicuna-7B model, our method achieved a 1.654% improvement in average performance on zero-shot tasks compared to the existing method. Moreover, we further reveal the potential of combining depth pruning with width pruning to enhance the pruning effect. Our codes are available at https://github.com/920927/SLM-a-sliding-layer-merging-method.

arxiv情報

著者	Xuan Ding,Rui Sun,Yunjian Zhang,Xiu Yan,Yueqi Zhou,Kaihao Huang,Suzhong Fu,Chuanlong Xie,Yao Zhu
発行日	2025-05-12 12:43:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー