OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios

要約

大規模な言語モデルの急速な発展により、研究者は人間と自然に会話できる高度な音声対話システムを構築してきた。しかし、これらのシステムは、音声イベント、音楽コンテクスト、感情表現を含む実世界の会話の完全な複雑性を処理するのに苦労している。本論文では、多様なシナリオにおける対話モデルを強化するために、合成データを活用することを提案する。ShareChatXを紹介する。ShareChatXは、多様なシナリオにまたがる初の包括的な大規模音声対話データセットである。このデータセットを基に、異なる対話コンテキストにおける特徴選択を最適化するために設計された、異種特徴融合モジュールを備えたマルチターン対話システムであるOmniChatを紹介する。さらに、合成データを用いた対話システムの学習における重要な側面を探求した。包括的な実験を通じて、合成データと実データの理想的なバランスを決定し、実世界の対話データセットDailyTalkで最先端の結果を達成した。また、多様で複雑な対話シナリオ、特に音声や音楽を含む対話シナリオに取り組む上で、合成データが極めて重要であることを強調しました。詳しくは、デモページをご覧ください。

要約(オリジナル)

With the rapid development of large language models, researchers have created increasingly advanced spoken dialogue systems that can naturally converse with humans. However, these systems still struggle to handle the full complexity of real-world conversations, including audio events, musical contexts, and emotional expressions, mainly because current dialogue datasets are constrained in both scale and scenario diversity. In this paper, we propose leveraging synthetic data to enhance the dialogue models across diverse scenarios. We introduce ShareChatX, the first comprehensive, large-scale dataset for spoken dialogue that spans diverse scenarios. Based on this dataset, we introduce OmniChat, a multi-turn dialogue system with a heterogeneous feature fusion module, designed to optimize feature selection in different dialogue contexts. In addition, we explored critical aspects of training dialogue systems using synthetic data. Through comprehensive experimentation, we determined the ideal balance between synthetic and real data, achieving state-of-the-art results on the real-world dialogue dataset DailyTalk. We also highlight the crucial importance of synthetic data in tackling diverse, complex dialogue scenarios, especially those involving audio and music. For more details, please visit our demo page at \url{https://sharechatx.github.io/}.

arxiv情報

著者	Xize Cheng,Dongjie Fu,Xiaoda Yang,Minghui Fang,Ruofan Hu,Jingyu Lu,Bai Jionghao,Zehan Wang,Shengpeng Ji,Rongjie Huang,Linjun Li,Yu Chen,Tao Jin,Zhou Zhao
発行日	2025-01-02 17:58:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー