Large Language Models Are Overparameterized Text Encoders

要約

大規模言語モデル (LLM) は、教師あり対比トレーニングで微調整すると、テキスト埋め込みモデルとして強力なパフォーマンスを示します。
ただし、サイズが大きいため、推論時間とメモリ要件が増大します。
この論文では、1000 ステップのみの教師ありトレーニングの前に LLM の最後の $p\%$ 層を枝刈りすることで、メモリと推論時間の比例的な削減が達成できることを示します。
テキスト埋め込みタスクで 4 つの異なる最先端の LLM を評価したところ、この方法ではパフォーマンスへの影響が無視できる程度で最大 30\% のレイヤーを削除でき、わずかな低下で最大 80\% のレイヤーを削除できることがわかりました。
私たちのメソッドは、わずか 3 行のコードで、LLM をテキストエンコーダに変換するパイプラインに簡単に実装できます。
また、$\text{L}^3 \text{Prune}$ を提案します。これは、モデルの初期損失に基づいた新しい層枝刈り戦略で、2 つの最適な枝刈り構成を提供します。パフォーマンス損失が無視できるほど大きいバリアントと、リソースの小さいバリアントです。
-制約された設定。
平均すると、大きなバリアントではパラメータの 21\% がプルーニングされ、パフォーマンスが $-0.3$ 低下します。小さいバリアントでは、モデルの 74\% がプルーニングされているにもかかわらず、$-5.1$ の低下のみが影響します。
これらの結果は、LLM がテキスト埋め込みタスクに対して過剰にパラメータ化されており、簡単に削除できることを示す強力な証拠であると考えられます。

要約(オリジナル)

Large language models (LLMs) demonstrate strong performance as text embedding models when finetuned with supervised contrastive training. However, their large size balloons inference time and memory requirements. In this paper, we show that by pruning the last $p\%$ layers of an LLM before supervised training for only 1000 steps, we can achieve a proportional reduction in memory and inference time. We evaluate four different state-of-the-art LLMs on text embedding tasks and find that our method can prune up to 30\% of layers with negligible impact on performance and up to 80\% with only a modest drop. With only three lines of code, our method is easily implemented in any pipeline for transforming LLMs to text encoders. We also propose $\text{L}^3 \text{Prune}$, a novel layer-pruning strategy based on the model’s initial loss that provides two optimal pruning configurations: a large variant with negligible performance loss and a small variant for resource-constrained settings. On average, the large variant prunes 21\% of the parameters with a $-0.3$ performance drop, and the small variant only suffers from a $-5.1$ decrease while pruning 74\% of the model. We consider these results strong evidence that LLMs are overparameterized for text embedding tasks, and can be easily pruned.

arxiv情報

著者	Thennal D K,Tim Fischer,Chris Biemann
発行日	2024-10-18 16:26:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Large Language Models Are Overparameterized Text Encoders

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー