Knowledge Distillation of Large Language Models

要約

知識蒸留 (KD) は、大規模言語モデル (LLM) の高い計算要求を軽減するための有望な手法です。
ただし、以前の KD メソッドは主にホワイトボックス分類モデル、または ChatGPT のようなブラックボックスモデル API を模倣する小さなモデルのトレーニングに適用されます。
ホワイトボックス LLM の知識を小規模なモデルに効果的に抽出する方法はまだ研究されていませんが、オープンソース LLM の隆盛とともにその重要性が増しています。
この研究では、LLM をより小さな言語モデルに抽出する KD アプローチを提案します。
まず、標準的な KD アプローチの順カルバック・ライブラー発散 (KLD) 目標を、生成言語モデルでの KD により適した逆 KLD に置き換え、生徒モデルが教師分布の低確率領域を過大評価するのを防ぎます。
次に、この目的を学習するための効果的な最適化アプローチを導き出します。
学生モデルの名前は MiniLLM です。
指示に従う設定での広範な実験により、MiniLLM はベースラインよりも高い全体品質、より低い露出バイアス、より優れたキャリブレーション、より高い長文生成パフォーマンスを備えたより正確な応答を生成することが示されています。
私たちの方法は、120M ～ 13B のパラメータを持つさまざまなモデルファミリに拡張可能です。
コード、データ、モデルのチェックポイントは、https://github.com/microsoft/LMOps/tree/main/minillm にあります。

要約(オリジナル)

Knowledge Distillation (KD) is a promising technique for reducing the high computational demand of large language models (LLMs). However, previous KD methods are primarily applied to white-box classification models or training small models to imitate black-box model APIs like ChatGPT. How to effectively distill the knowledge of white-box LLMs into small models is still under-explored, which becomes more important with the prosperity of open-source LLMs. In this work, we propose a KD approach that distills LLMs into smaller language models. We first replace the forward Kullback-Leibler divergence (KLD) objective in the standard KD approaches with reverse KLD, which is more suitable for KD on generative language models, to prevent the student model from overestimating the low-probability regions of the teacher distribution. Then, we derive an effective optimization approach to learn this objective. The student models are named MiniLLM. Extensive experiments in the instruction-following setting show that MiniLLM generates more precise responses with higher overall quality, lower exposure bias, better calibration, and higher long-text generation performance than the baselines. Our method is scalable for different model families with 120M to 13B parameters. Our code, data, and model checkpoints can be found in https://github.com/microsoft/LMOps/tree/main/minillm.

arxiv情報

著者	Yuxian Gu,Li Dong,Furu Wei,Minlie Huang
発行日	2024-03-12 16:15:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Knowledge Distillation of Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー