SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue in Multiple Domains

要約

タスク指向対話 (TOD) モデルは、ここ数年で大きな進歩を遂げました。
ただし、これらの研究は主にアノテーターによって書かれたデータセットに焦点を当てているため、学術研究とより現実的な音声会話シナリオとの間にギャップが生じています。
ASR エラーなどの堅牢性の問題に対処するために、小規模な音声 TOD データセットがいくつか提案されていますが、それらは音声会話特有の課題を特定できません。
この制限に対処するために、SpokenWOZ を導入します。SpokenWOZ は、8 つのドメイン、203,000 ターン、5,700 の対話、および人間同士の音声会話からの 249 時間の音声で構成される、音声 TOD 用の大規模音声テキストデータセットです。
SpokenWOZ には、単語ごとの処理や常識的な推論などの一般的な音声特性が組み込まれています。
また、音声言語現象に基づいた新たな課題として、クロスターンスロットと推論スロット検出を紹介します。
私たちは、テキストモーダルベースライン、新しく提案されたデュアルモーダルベースライン、LLM など、さまざまなモデルで包括的な実験を実施します。
結果は、現在のモデルには、微調整されたモデルや LLM、つまり ChatGPT など、音声会話において改善の余地があるかなりの領域がまだあることを示しています。

要約(オリジナル)

Task-oriented dialogue (TOD) models have great progress in the past few years. However, these studies primarily focus on datasets written by annotators, which has resulted in a gap between academic research and more realistic spoken conversation scenarios. While a few small-scale spoken TOD datasets are proposed to address robustness issues, e.g., ASR errors, they fail to identify the unique challenges in spoken conversation. To tackle the limitations, we introduce SpokenWOZ, a large-scale speech-text dataset for spoken TOD, which consists of 8 domains, 203k turns, 5.7k dialogues and 249 hours of audios from human-to-human spoken conversations. SpokenWOZ incorporates common spoken characteristics such as word-by-word processing and commonsense reasoning. We also present cross-turn slot and reasoning slot detection as new challenges based on the spoken linguistic phenomena. We conduct comprehensive experiments on various models, including text-modal baselines, newly proposed dual-modal baselines and LLMs. The results show the current models still has substantial areas for improvement in spoken conversation, including fine-tuned models and LLMs, i.e., ChatGPT.

arxiv情報

著者	Shuzheng Si,Wentao Ma,Yuchuan Wu,Yinpei Dai,Haoyu Gao,Ting-En Lin,Hangyu Li,Rui Yan,Fei Huang,Yongbin Li
発行日	2023-05-22 13:47:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue in Multiple Domains

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー