Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning

要約

LLaMA (Touvron et al., 2023a;b) や最近登場したその他の中規模大規模言語モデル (LLM) の人気は、より小さくても強力な LLM を構築できる可能性を浮き彫りにしています。
いずれにせよ、何兆ものトークンを使ってそのようなモデルをゼロからトレーニングするコストは依然として高いままです。
この研究では、事前トレーニングされたより大きなモデルからより小さな LLM を開発するための効果的な手段として、構造化枝刈りを研究します。
私たちのアプローチでは、次の 2 つの主要な手法を採用しています。(1) ターゲットを絞った構造化枝刈り。レイヤー、ヘッド、中間および隠れた次元をエンドツーエンドで削除することで、より大きなモデルを指定されたターゲット形状に枝刈りします。(2) 動的バッチ
ロード。さまざまなドメインにわたるさまざまな損失に基づいて、各トレーニングバッチ内のサンプリングされたデータの構成を動的に更新します。
LLaMA2-7B モデルを 1.3B および 2.7B パラメーターまで枝刈りした Sheared-LLaMA シリーズを提示することで、アプローチの有効性を実証します。
Sheared-LLaMA モデルは、ダウンストリームおよび命令チューニングの幅広い評価において、同等のサイズの最先端のオープンソースモデル (Pythia、INCITE、OpenLLaMA モデルなど) よりも優れたパフォーマンスを発揮し、必要なコンピューティング量は 3% のみです。
そのようなモデルを最初からトレーニングします。
この研究は、構造化プルーニングを使用して既存の LLM を活用することが、より小規模な LLM を構築するためのはるかにコスト効率の高いアプローチであるという説得力のある証拠を提供します。

要約(オリジナル)

The popularity of LLaMA (Touvron et al., 2023a;b) and other recently emerged moderate-sized large language models (LLMs) highlights the potential of building smaller yet powerful LLMs. Regardless, the cost of training such models from scratch on trillions of tokens remains high. In this work, we study structured pruning as an effective means to develop smaller LLMs from pre-trained, larger models. Our approach employs two key techniques: (1) targeted structured pruning, which prunes a larger model to a specified target shape by removing layers, heads, and intermediate and hidden dimensions in an end-to-end manner, and (2) dynamic batch loading, which dynamically updates the composition of sampled data in each training batch based on varying losses across different domains. We demonstrate the efficacy of our approach by presenting the Sheared-LLaMA series, pruning the LLaMA2-7B model down to 1.3B and 2.7B parameters. Sheared-LLaMA models outperform state-of-the-art open-source models of equivalent sizes, such as Pythia, INCITE, and OpenLLaMA models, on a wide range of downstream and instruction tuning evaluations, while requiring only 3% of compute compared to training such models from scratch. This work provides compelling evidence that leveraging existing LLMs with structured pruning is a far more cost-effective approach for building smaller LLMs.

arxiv情報

著者	Mengzhou Xia,Tianyu Gao,Zhiyuan Zeng,Danqi Chen
発行日	2023-10-10 15:13:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー