Towards Robust Speech Representation Learning for Thousands of Languages

要約

自己教師あり学習 (SSL) は、ラベル付きデータの必要性を減らし、音声テクノロジーをより多くの言語に拡張するのに役立ちました。
ただし、モデルが世界の 7,000 以上の言語をサポートするにはまだ程遠いです。
私たちは、4,057 言語にわたる 100 万時間以上のデータでトレーニングされたユニバーサルスピーチ用のクロスリンガルエンコーダーである XEUS を提案し、SSL モデルの言語範囲を 4 倍に拡張します。
私たちは、公的にアクセス可能な既存のコーパスからの 100 万時間の音声と、新たに作成された 4,057 言語からの 7,400 時間以上のコーパスを結合し、一般に公開されます。
多言語音声データのさまざまな条件を処理するために、典型的な SSL マスク予測アプローチを新しい残響除去目的で強化し、堅牢性を高めます。
XEUS をいくつかのベンチマークで評価し、さまざまなタスクにわたって、一貫して最先端 (SOTA) SSL モデルを上回るパフォーマンスまたは同等の結果を達成することを示しています。
XEUS は、ML-SUPERB ベンチマークで新しい SOTA を設定します。パラメータや事前トレーニングデータが少ないにもかかわらず、MMS 1B と w2v-BERT 2.0 v2 をそれぞれ 0.8% と 4.4% 上回っています。
チェックポイント、コード、データは https://www.wavlab.org/activities/2024/xeus/ にあります。

要約(オリジナル)

Self-supervised learning (SSL) has helped extend speech technologies to more languages by reducing the need for labeled data. However, models are still far from supporting the world’s 7000+ languages. We propose XEUS, a Cross-lingual Encoder for Universal Speech, trained on over 1 million hours of data across 4057 languages, extending the language coverage of SSL models 4-fold. We combine 1 million hours of speech from existing publicly accessible corpora with a newly created corpus of 7400+ hours from 4057 languages, which will be publicly released. To handle the diverse conditions of multilingual speech data, we augment the typical SSL masked prediction approach with a novel dereverberation objective, increasing robustness. We evaluate XEUS on several benchmarks, and show that it consistently outperforms or achieves comparable results to state-of-the-art (SOTA) SSL models across a variety of tasks. XEUS sets a new SOTA on the ML-SUPERB benchmark: it outperforms MMS 1B and w2v-BERT 2.0 v2 by 0.8% and 4.4% respectively, despite having less parameters or pre-training data. Checkpoints, code, and data are found in https://www.wavlab.org/activities/2024/xeus/.

arxiv情報

著者	William Chen,Wangyou Zhang,Yifan Peng,Xinjian Li,Jinchuan Tian,Jiatong Shi,Xuankai Chang,Soumi Maiti,Karen Livescu,Shinji Watanabe
発行日	2024-07-02 17:23:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards Robust Speech Representation Learning for Thousands of Languages

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー