MiniALBERT: Model Distillation via Parameter-Efficient Recursive Transformers

要約

【タイトル】 MiniALBERT: パラメータの効率的な再帰トランスフォーマーによるモデル蒸留

【要約】

– パラメータ数の多いモデルは計算時間や複雑性の問題を抱えており、その問題を解決する手法の一つとして、モデル蒸留がある。
– モデル蒸留は学習済みのパラメータの多いモデルを、効率的なパラメータを持つ小さなモデルに変換することができる手法である。
– 本研究では、モデル蒸留とクロスレイヤーパラメータ共有を組み合わせた手法を提案し、その手法を用いてBertのようなモデルを、より効率的なRecursive Studentに蒸留するMiniALBERTを作成した。
– また、レイヤーごとの適応のためのボトルネックアダプタの適用や、コンパクトモデルのファインチューニングのためのアダプタチューニングの効果についても探究した。
– MiniALBERTを一般的な生物医学のNLPタスクに適用し、その有用性を調べ、既存のコンパクトモデルや最先端の手法と比較した。
– 実験に使用されたすべてのコードはhttps://github.com/nlpie-research/MiniALBERTで入手することができ、学習済みのコンパクトモデルはhttps://huggingface.co/nlpieで利用できる。

要約(オリジナル)

Pre-trained Language Models (LMs) have become an integral part of Natural Language Processing (NLP) in recent years, due to their superior performance in downstream applications. In spite of this resounding success, the usability of LMs is constrained by computational and time complexity, along with their increasing size; an issue that has been referred to as `overparameterisation’. Different strategies have been proposed in the literature to alleviate these problems, with the aim to create effective compact models that nearly match the performance of their bloated counterparts with negligible performance losses. One of the most popular techniques in this area of research is model distillation. Another potent but underutilised technique is cross-layer parameter sharing. In this work, we combine these two strategies and present MiniALBERT, a technique for converting the knowledge of fully parameterised LMs (such as BERT) into a compact recursive student. In addition, we investigate the application of bottleneck adapters for layer-wise adaptation of our recursive student, and also explore the efficacy of adapter tuning for fine-tuning of compact models. We test our proposed models on a number of general and biomedical NLP tasks to demonstrate their viability and compare them with the state-of-the-art and other existing compact models. All the codes used in the experiments are available at https://github.com/nlpie-research/MiniALBERT. Our pre-trained compact models can be accessed from https://huggingface.co/nlpie.

arxiv情報

著者	Mohammadmahdi Nouriborji,Omid Rohanian,Samaneh Kouchaki,David A. Clifton
発行日	2023-04-30 13:00:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

MiniALBERT: Model Distillation via Parameter-Efficient Recursive Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー