Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization

要約

この論文では、抽象的なテキストの要約に最適化された新しい事前トレーニング済み言語モデルである Z-Code++ について説明します。
このモデルは、3 つの手法を使用して最先端のエンコーダ/デコーダモデルを拡張します。
まず、2 フェーズの事前トレーニングプロセスを使用して、低リソースの要約タスクにおけるモデルのパフォーマンスを向上させます。
モデルは、まず言語理解のためにテキストコーパスを使用して事前トレーニングされ、次に根拠のあるテキスト生成のために要約コーパスで継続的に事前トレーニングされます。
次に、エンコーダのセルフアテンションレイヤーを、もつれの解けたアテンションレイヤーに置き換えます。各単語は、その内容と位置をそれぞれエンコードする 2 つのベクトルを使用して表現されます。
3 番目に、長いシーケンスを階層的にエンコードするシンプルかつ効果的な方法であるフュージョンインエンコーダーを使用します。
Z-Code++ は、5 つの言語にわたる 13 のテキスト要約タスクのうち 9 タスクで新しい最先端技術を作成します。
私たちのモデルは、XSum で 600 倍大きい PaLM-540B を上回り、SAMSum で微調整された 200 倍大きい GPT3-175B よりも優れているという点で、パラメーター効率が優れています。
ゼロショットおよび少数ショット設定では、当社のモデルは競合モデルよりも大幅に優れています。

要約(オリジナル)

This paper presents Z-Code++, a new pre-trained language model optimized for abstractive text summarization. The model extends the state of the art encoder-decoder model using three techniques. First, we use a two-phase pre-training process to improve model’s performance on low-resource summarization tasks. The model is first pre-trained using text corpora for language understanding, and then is continually pre-trained on summarization corpora for grounded text generation. Second, we replace self-attention layers in the encoder with disentangled attention layers, where each word is represented using two vectors that encode its content and position, respectively. Third, we use fusion-in-encoder, a simple yet effective method of encoding long sequences in a hierarchical manner. Z-Code++ creates new state of the art on 9 out of 13 text summarization tasks across 5 languages. Our model is parameter-efficient in that it outperforms the 600x larger PaLM-540B on XSum, and the finetuned 200x larger GPT3-175B on SAMSum. In zero-shot and few-shot settings, our model substantially outperforms the competing models.

arxiv情報

著者	Pengcheng He,Baolin Peng,Liyang Lu,Song Wang,Jie Mei,Yang Liu,Ruochen Xu,Hany Hassan Awadalla,Yu Shi,Chenguang Zhu,Wayne Xiong,Michael Zeng,Jianfeng Gao,Xuedong Huang
発行日	2023-06-07 17:13:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー