Typhoon: Towards an Effective Task-Specific Masking Strategy for Pre-trained Language Models

要約

グラフィックスプロセッシングユニットによって実現される高レベルの並列処理を活用することで、Transformer アーキテクチャは自然言語処理の分野で大きな進歩を遂げました。
従来のマスクされた言語モデルでは、特別な MASK トークンを使用して、モデルに周囲の単語からコンテキスト情報を収集し、元の隠された情報を復元するよう促します。
このホワイトペーパーでは、GLUE ベンチマークのデータセットの特定のダウンストリームタスクで優れたパフォーマンスを実現する、事前トレーニング済みの大規模言語モデル用のタスク固有のマスキングフレームワークについて説明します。
トークン入力勾配に基づいて独自のマスキングアルゴリズム Typhoon を開発し、これを他の標準ベースラインと比較します。
Typhoon は、MRPC データセットの単語全体のマスキングと競合するパフォーマンスを提供することがわかりました。
私たちの実装は、パブリック Github リポジトリにあります。

要約(オリジナル)

Through exploiting a high level of parallelism enabled by graphics processing units, transformer architectures have enabled tremendous strides forward in the field of natural language processing. In a traditional masked language model, special MASK tokens are used to prompt our model to gather contextual information from surrounding words to restore originally hidden information. In this paper, we explore a task-specific masking framework for pre-trained large language models that enables superior performance on particular downstream tasks on the datasets in the GLUE benchmark. We develop our own masking algorithm, Typhoon, based on token input gradients, and compare this with other standard baselines. We find that Typhoon offers performance competitive with whole-word masking on the MRPC dataset. Our implementation can be found in a public Github Repository.

arxiv情報

著者	Muhammed Shahir Abdurrahman,Hashem Elezabi,Bruce Changlong Xu
発行日	2023-03-27 22:27:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Typhoon: Towards an Effective Task-Specific Masking Strategy for Pre-trained Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー