HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers

要約

知識の蒸留は、事前にトレーニングされた言語モデルの実際の展開を容易にする強力なモデル圧縮アプローチであることが示されています。
このペーパーでは、タスクに依存しない蒸留に焦点を当てています。
コンパクトな事前トレーニング済みモデルを生成し、計算コストとメモリフットプリントを抑えながら、さまざまなタスクで簡単に微調整できます。
実用的な利点にもかかわらず、タスクに依存しない蒸留は困難です。
教師モデルは生徒モデルよりもはるかに大きな容量とより強い表現力を持っているため、生徒が大量のオープンドメインのトレーニングデータに対して教師と一致する予測を生成することは非常に困難です。
このような大きな予測の不一致は、多くの場合、知識の蒸留の利点を損ないます。
この課題に対処するために、反復プルーニングを備えた新しいタスクに依存しない蒸留アプローチであるホモトピック蒸留 (HomoDistil) を提案します。
具体的には、教師モデルから生徒モデルを初期化し、目標幅に達するまで生徒のニューロンを繰り返し刈り込みます。
このようなアプローチは、知識伝達の有効性を保証する蒸留プロセス全体を通して、教師の予測と学生の予測との間の小さな不一致を維持します。
広範な実験により、HomoDistil が既存のベースラインを大幅に改善することが実証されています。

要約(オリジナル)

Knowledge distillation has been shown to be a powerful model compression approach to facilitate the deployment of pre-trained language models in practice. This paper focuses on task-agnostic distillation. It produces a compact pre-trained model that can be easily fine-tuned on various tasks with small computational costs and memory footprints. Despite the practical benefits, task-agnostic distillation is challenging. Since the teacher model has a significantly larger capacity and stronger representation power than the student model, it is very difficult for the student to produce predictions that match the teacher’s over a massive amount of open-domain training data. Such a large prediction discrepancy often diminishes the benefits of knowledge distillation. To address this challenge, we propose Homotopic Distillation (HomoDistil), a novel task-agnostic distillation approach equipped with iterative pruning. Specifically, we initialize the student model from the teacher model, and iteratively prune the student’s neurons until the target width is reached. Such an approach maintains a small discrepancy between the teacher’s and student’s predictions throughout the distillation process, which ensures the effectiveness of knowledge transfer. Extensive experiments demonstrate that HomoDistil achieves significant improvements on existing baselines.

arxiv情報

著者	Chen Liang,Haoming Jiang,Zheng Li,Xianfeng Tang,Bin Yin,Tuo Zhao
発行日	2023-02-19 17:37:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー