Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability

要約

人工知能（AI）は、特に大規模な言語モデル（LLM）の大幅な進歩を通じて、最近、現代社会にますます影響を与えています。
ただし、LLMの計算およびストレージの高い要求は、リソースに制約のある環境での展開を依然として制限しています。
知識の蒸留は、より大きな教師モデルから小さな生徒モデルをトレーニングすることにより、この課題に対処します。
以前の研究では、トレーニングデータの生成と学生モデルのトレーニングの両方のために、いくつかの蒸留方法が導入されています。
それらの関連性にもかかわらず、モデルのパフォーマンスと説明可能性に対する最先端の蒸留方法の影響は、徹底的に調査され、比較されていません。
この作業では、データ生成のための蒸留に批評家の促進を促すことと、既存のトレーニングのための既存の方法を合成することにより、利用可能な方法のセットを拡大します。
これらの方法では、広く使用されている常識的な質問（CQA）データセットに基づいた体系的な比較を提供します。
学生モデルの精度を介してパフォーマンスを測定しますが、説明可能性を評価するために人間に基づいた研究を採用しています。
私たちは、パフォーマンスと説明可能性の両方の観点から、新しい蒸留方法とその比較を貢献します。
これにより、小言語モデルの蒸留がさらに進むため、LLMテクノロジーのより広範な適用性とより速い拡散に貢献します。

要約(オリジナル)

Artificial Intelligence (AI) has increasingly influenced modern society, recently in particular through significant advancements in Large Language Models (LLMs). However, high computational and storage demands of LLMs still limit their deployment in resource-constrained environments. Knowledge distillation addresses this challenge by training a small student model from a larger teacher model. Previous research has introduced several distillation methods for both generating training data and for training the student model. Despite their relevance, the effects of state-of-the-art distillation methods on model performance and explainability have not been thoroughly investigated and compared. In this work, we enlarge the set of available methods by applying critique-revision prompting to distillation for data generation and by synthesizing existing methods for training. For these methods, we provide a systematic comparison based on the widely used Commonsense Question-Answering (CQA) dataset. While we measure performance via student model accuracy, we employ a human-grounded study to evaluate explainability. We contribute new distillation methods and their comparison in terms of both performance and explainability. This should further advance the distillation of small language models and, thus, contribute to broader applicability and faster diffusion of LLM technology.

arxiv情報

著者	Daniel Hendriks,Philipp Spitzer,Niklas Kühl,Gerhard Satzger
発行日	2025-04-22 17:32:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー