oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes

要約

タイトル：oBERTa：改善された初期化、

要約(オリジナル)

In this paper, we introduce the range of oBERTa language models, an easy-to-use set of language models, which allows Natural Language Processing (NLP) practitioners to obtain between 3.8 and 24.3 times faster models without expertise in model compression. Specifically, oBERTa extends existing work on pruning, knowledge distillation, and quantization and leverages frozen embeddings to improve knowledge distillation, and improved model initialization to deliver higher accuracy on a a broad range of transfer tasks. In generating oBERTa, we explore how the highly optimized RoBERTa differs from the BERT with respect to pruning during pre-training and fine-tuning and find it less amenable to compression during fine-tuning. We explore the use of oBERTa on a broad seven representative NLP tasks and find that the improved compression techniques allow a pruned oBERTa model to match the performance of BERTBASE and exceed the performance of Prune OFA Large on the SQUAD V1.1 Question Answering dataset, despite being 8x and 2x, respectively, faster in inference. We release our code, training regimes, and associated model for broad usage to encourage usage and experimentation.

arxiv情報

著者	Daniel Campos,Alexandre Marques,Mark Kurtz,ChengXiang Zhai
発行日	2023-03-30 01:37:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー