Orca: A Few-shot Benchmark for Chinese Conversational Machine Reading Comprehension

要約

会話型機械読解 (CMRC) タスクは、会話中の質問に回答することを目的としています。これは、その幅広いアプリケーションのために近年注目されている研究トピックです。
ただし、各会話に静的なパッセージが割り当てられている既存の CMRC ベンチマークは、実際のシナリオと矛盾しています。
したがって、実際のシナリオに対するモデルの理解力を合理的に評価することは困難です。
この目的のために、最初の中国の CMRC ベンチマーク Orca を提案し、多様なドメインに対するモデルの一般化能力を評価するためのゼロショット/少数ショット設定をさらに提供します。
合計 4,742 ターンで 831 の話題性の高い会話を収集します。
会話の各ターンには、モデルの理解能力をより合理的に評価することを目的として、応答に関連するパッセージが割り当てられます。
会話のトピックはソーシャルメディアプラットフォームから収集され、33 のドメインをカバーしており、実際のシナリオと一致するように努めています。
重要なことに、Orca の回答はすべて、以前のデータセットの特定のスパンや短いフレーズではなく、十分に注釈が付けられた自然な応答です。
さらに、Orca の課題に取り組むために 3 つの強力なベースラインを実装しています。
結果は、CMRC ベンチマークの大きな課題を示しています。
私たちのデータセットとチェックポイントは、https://github.com/nuochenpku/Orca で入手できます。

要約(オリジナル)

The conversational machine reading comprehension (CMRC) task aims to answer questions in conversations, which has been a hot research topic in recent years because of its wide applications. However, existing CMRC benchmarks in which each conversation is assigned a static passage are inconsistent with real scenarios. Thus, model’s comprehension ability towards real scenarios are hard to evaluate reasonably. To this end, we propose the first Chinese CMRC benchmark Orca and further provide zero-shot/few-shot settings to evaluate model’s generalization ability towards diverse domains. We collect 831 hot-topic driven conversations with 4,742 turns in total. Each turn of a conversation is assigned with a response-related passage, aiming to evaluate model’s comprehension ability more reasonably. The topics of conversations are collected from social media platform and cover 33 domains, trying to be consistent with real scenarios. Importantly, answers in Orca are all well-annotated natural responses rather than the specific spans or short phrase in previous datasets. Besides, we implement three strong baselines to tackle the challenge in Orca. The results indicate the great challenge of our CMRC benchmark. Our datatset and checkpoints are available at https://github.com/nuochenpku/Orca.

arxiv情報

著者	Nuo Chen,Hongguang Li,Yinan Bao,Junqing He,Xinshi Lin,Qi Yang,Jianfeng Liu,Ruyi Gan,Jiaxing Zhang,Baoyuan Wang,Jia Li
発行日	2023-02-27 09:40:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Orca: A Few-shot Benchmark for Chinese Conversational Machine Reading Comprehension

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー