Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling

要約

談話のモデリング、つまり個々の文を超えた言語現象は、自然言語処理 (NLP) の基本的かつ困難な側面です。
しかし、既存の評価ベンチマークは主に文間の特性の評価に焦点を当てており、文をまたぐ重要な談話現象が見落とされています。
このギャップを埋めるために、理解、翻訳、生成をカバーするさまざまな NLP タスクにわたって文内の談話プロパティを評価できるベンチマークである Disco-Bench を提案します。
Disco-Bench は、文学分野の 9 つの文書レベルのテストセットで構成されており、中国語や英語の豊富な談話現象 (結束性や一貫性など) が含まれています。
言語分析のために、ターゲットモデルが談話知識を学習するかどうかを検査できる診断テストスイートも設計します。
私たちは、Transformer、高度な事前トレーニングアーキテクチャ、大規模言語モデル (LLM) に基づいた 20 の一般モデル、ドメイン内モデル、商用モデルを総合的に評価します。
私たちの結果は、(1) 評価ベンチマークの課題と必要性を示しています。
(2) 文学文書レベルのトレーニングデータに基づくきめの細かい事前トレーニングにより、談話情報のモデリングが一貫して向上します。
データセット、事前トレーニング済みモデル、およびリーダーボードをリリースします。これにより、この分野の研究が大幅に促進されることを期待しています: https://github.com/longyuewangdcu/Disco-Bench。

要約(オリジナル)

Modeling discourse — the linguistic phenomena that go beyond individual sentences, is a fundamental yet challenging aspect of natural language processing (NLP). However, existing evaluation benchmarks primarily focus on the evaluation of inter-sentence properties and overlook critical discourse phenomena that cross sentences. To bridge the gap, we propose Disco-Bench, a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks, covering understanding, translation, and generation. Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena (e.g. cohesion and coherence) in Chinese and/or English. For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge. We totally evaluate 20 general-, in-domain and commercial models based on Transformer, advanced pretraining architectures and large language models (LLMs). Our results show (1) the challenge and necessity of our evaluation benchmark; (2) fine-grained pretraining based on literary document-level training data consistently improves the modeling of discourse information. We will release the datasets, pretrained models, and leaderboard, which we hope can significantly facilitate research in this field: https://github.com/longyuewangdcu/Disco-Bench.

arxiv情報

著者	Longyue Wang,Zefeng Du,Donghuai Liu,Cai Deng,Dian Yu,Haiyun Jiang,Yan Wang,Leyang Cui,Shuming Shi,Zhaopeng Tu
発行日	2023-07-16 15:18:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー