Layer-wise Regularized Dropout for Neural Language Models

要約

現在人気のあるさまざまな事前トレーニング済みニューラル言語モデルの中で、ドロップアウトはすでに不可欠な正則化手法です。
ドロップアウトのランダム性によって引き起こされるトレーニングと推論の間の不一致を解決するために、一部の研究では一貫性トレーニングを使用して出力層でのドロップアウトを正規化しています。
この論文では、Transformer ベースの言語モデル用に特別に設計された新しい Layer-wise Regularized Dropout (LR-Drop) を提案します。
具体的には、LR-Drop は、一貫性トレーニング戦略を使用して各 Transformer レイヤーをレイヤーごとに正規化します。
各トレーニングサンプルは、ドロップアウトによってサンプリングされた 2 つのシャムサブモデルを通過し、LR ドロップによって、隠れ状態、マルチヘッドアテンション行列、および 2 つのシャムサブモデルの出力分布が一貫性を持つように強制されます。
提案された LR-Drop は、ドロップアウトによって生成された各サブモデルが他のサブモデルの「教師」モデルと「生徒」モデルである「自己蒸留」フレームワークとみなすことができます。
8 つの自然言語理解データセット、6 つのニューラル機械翻訳データセット、および 1 つの抽象的要約データセット (合計 15 データセット) に関する広範な実験を通じて、LR-Drop が最先端の結果を含む優れたパフォーマンスを達成することを示しました。

要約(オリジナル)

Among the various pre-trained neural language models that are popular today, dropout is already an indispensable regularization technique. To solve the inconsistency between training and inference caused by the randomness of dropout, some studies use consistency training to regularize dropout at the output layer. In this paper, we propose a novel Layer-wise Regularized Dropout (LR-Drop), which is specially designed for Transformer-based Language models. Specifically, LR-Drop layer-wise regularizes each Transformer layer using the consistency training strategy. Each training sample passes through the two siamese sub-models sampled by dropout, and then LR-Drop forces the hidden states, multi-head attention matrices, and output distribution of the two siamese sub-models to be consistent. The proposed LR-Drop can be regarded as a ‘self-distillation’ framework, in which each sub-model generated by dropout is the other’s ‘teacher’ model and ‘student’ model. Through extensive experiments on 8 natural language understanding datasets, 6 neural machine translation datasets, and 1 abstractive summarization dataset (a total of 15 datasets), we show that LR-Drop achieves superior performances, including state-of-the-art results.

arxiv情報

著者	Shiwen Ni,Min Yang,Ruifeng Xu,Chengming Li,Xiping Hu
発行日	2024-02-26 07:31:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Layer-wise Regularized Dropout for Neural Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー