Structural Self-Supervised Objectives for Transformers

要約

この論文は、教師なし生データを使用して自然言語モデルの事前トレーニングを改善し、自然言語モデルをより効率的にし、下流のアプリケーションと連携させることに焦点を当てています。
最初の部分では、BERT のマスク言語モデリング (MLM) に代わる 3 つの事前トレーニング目標、つまりランダムトークン置換 (RTS)、クラスターベースのランダムトークン置換 (C-RTS)、およびスワップ言語モデリング (SLM) を紹介します。
これらの目的にはマスキングの代わりにトークンスワッピングが含まれており、RTS と C-RTS はトークンの独自性を予測することを目的とし、SLM は元のトークンの値を予測します。
結果は、RTS と C-RTS は MLM と同等のパフォーマンスを維持しながら、事前トレーニングに必要な時間が短縮されることを示しています。
驚くべきことに、同じ計算量を使用しているにもかかわらず、特定のタスクでは SLM が MLM よりも優れたパフォーマンスを発揮します。
2 番目の部分では、下流のアプリケーションと構造的に整合し、ラベル付きデータの必要性を減らす、自己監視型の事前トレーニングタスクを提案します。
私たちは、Wikipedia や CC-News などの大規模なコーパスを使用して、テキストスパンが同じ段落または文書から生じているかどうかをいくつかの方法で認識するようにモデルをトレーニングします。
RoBERTa、ELECTRA、DeBERTa、BART、T5 などの既存のモデルから始めて継続的な事前トレーニングを行うことで、事実検証、回答文の選択、要約などのタスクでパフォーマンスが大幅に向上することが実証されています。
これらの改善は、利用可能な注釈データが限られている場合に特に顕著です。
提案された目標は、FEVER (開発セット)、ASNQ、WikiQA、TREC-QA を含むさまざまなベンチマークデータセットで最先端の結果を達成するだけでなく、概要の品質も向上します。
重要なのは、これらの手法は、Transformer モデルの内部構造を変更することなく他の手法と簡単に統合できるため、さまざまな NLP アプリケーションに多用途に使用できることです。

要約(オリジナル)

This thesis focuses on improving the pre-training of natural language models using unsupervised raw data to make them more efficient and aligned with downstream applications. In the first part, we introduce three alternative pre-training objectives to BERT’s Masked Language Modeling (MLM), namely Random Token Substitution (RTS), Cluster-based Random Token Substitution (C-RTS), and Swapped Language Modeling (SLM). These objectives involve token swapping instead of masking, with RTS and C-RTS aiming to predict token originality and SLM predicting the original token values. Results show that RTS and C-RTS require less pre-training time while maintaining performance comparable to MLM. Surprisingly, SLM outperforms MLM on certain tasks despite using the same computational budget. In the second part, we proposes self-supervised pre-training tasks that align structurally with downstream applications, reducing the need for labeled data. We use large corpora like Wikipedia and CC-News to train models to recognize if text spans originate from the same paragraph or document in several ways. By doing continuous pre-training, starting from existing models like RoBERTa, ELECTRA, DeBERTa, BART, and T5, we demonstrate significant performance improvements in tasks like Fact Verification, Answer Sentence Selection, and Summarization. These improvements are especially pronounced when limited annotation data is available. The proposed objectives also achieve state-of-the-art results on various benchmark datasets, including FEVER (dev set), ASNQ, WikiQA, and TREC-QA, as well as enhancing the quality of summaries. Importantly, these techniques can be easily integrated with other methods without altering the internal structure of Transformer models, making them versatile for various NLP applications.

arxiv情報

著者	Luca Di Liello
発行日	2023-09-15 09:30:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Structural Self-Supervised Objectives for Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー