Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond

要約

我々は、MASSIVE テキストコーパスの一部の音声対応物を構成する多言語音声言語理解 (SLU) データセットである Speech-MASSIVE を紹介します。
Speech-MASSIVE は、さまざまなファミリーの 12 の言語をカバーしており、意図予測とスロット充填タスクの注釈を MASSIVE から継承しています。
私たちの拡張は、大規模な多言語 SLU データセットの不足と、複数の言語やタスクにわたる基礎モデル (LLM、音声エンコーダー) を評価するための多用途の音声データセットに対するニーズの高まりによって促進されました。
マルチモーダル、マルチタスク、多言語のデータセットを提供し、さまざまなトレーニングシナリオ (ゼロショット、少数ショット、完全な微調整) でカスケードアーキテクチャとエンドツーエンドアーキテクチャの両方を使用して SLU ベースラインをレポートします。
さらに、音声転写、言語識別、音声翻訳などの他のタスクのベンチマークに対する Speech-MASSIVE の適合性を実証します。
データセット、モデル、コードは、https://github.com/hlt-mt/Speech-MASSIVE で公開されています。

要約(オリジナル)

We present Speech-MASSIVE, a multilingual Spoken Language Understanding (SLU) dataset comprising the speech counterpart for a portion of the MASSIVE textual corpus. Speech-MASSIVE covers 12 languages from different families and inherits from MASSIVE the annotations for the intent prediction and slot-filling tasks. Our extension is prompted by the scarcity of massively multilingual SLU datasets and the growing need for versatile speech datasets to assess foundation models (LLMs, speech encoders) across languages and tasks. We provide a multimodal, multitask, multilingual dataset and report SLU baselines using both cascaded and end-to-end architectures in various training scenarios (zero-shot, few-shot, and full fine-tune). Furthermore, we demonstrate the suitability of Speech-MASSIVE for benchmarking other tasks such as speech transcription, language identification, and speech translation. The dataset, models, and code are publicly available at: https://github.com/hlt-mt/Speech-MASSIVE

arxiv情報

著者	Beomseok Lee,Ioan Calapodescu,Marco Gaido,Matteo Negri,Laurent Besacier
発行日	2024-08-07 16:55:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー