Sparse*BERT: Sparse Models Generalize To New tasks and Domains

要約

タイトル:Sparse*BERT:Sparseモデルは新しいタスクやドメインに汎用的に適用できる

要約:

– 自然言語処理の多くの現代的なシステムは、大規模な言語モデルを核として構築されている。
– これらのモデルは、高い計算負荷があるため、推論を行うことが困難であり、コストがかかる。
– 近年、構造化および非構造化プルーニング、量子化、および

要約(オリジナル)

Large Language Models have become the core architecture upon which most modern natural language processing (NLP) systems build. These models can consistently deliver impressive accuracy and robustness across tasks and domains, but their high computational overhead can make inference difficult and expensive. To make using these models less costly, recent work has explored leveraging structured and unstructured pruning, quantization, and distillation to improve inference speed and decrease size. This paper studies how models pruned using Gradual Unstructured Magnitude Pruning can transfer between domains and tasks. Our experimentation shows that models that are pruned during pretraining using general domain masked language models can transfer to novel domains and tasks without extensive hyperparameter exploration or specialized approaches. We demonstrate that our general sparse model Sparse*BERT can become SparseBioBERT simply by pretraining the compressed architecture on unstructured biomedical text. Moreover, we show that SparseBioBERT can match the quality of BioBERT with only 10\% of the parameters.

arxiv情報

著者 Daniel Campos,Alexandre Marques,Tuan Nguyen,Mark Kurtz,ChengXiang Zhai
発行日 2023-04-05 19:54:59+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

カテゴリー: cs.AI, cs.CL パーマリンク