OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models

要約

ニューラルスケーリング法則は、堅牢なシーケンス処理アーキテクチャを設計するための貴重な洞察を提供します。
これらの法律は他のモダリティで広範囲に特徴付けられていますが、スピーチの行動は比較的目立たないままです。
この作業では、0.25Bから18Bパラメーターに及ぶ多言語の音声認識と翻訳モデルのオープンアクセスの再現可能なスイートであるOwlsを紹介します。
Owlsは、150の言語で最大36万時間のパブリック音声データを活用して、多言語の音声タスクにおける各影響パフォーマンスのデータ、モデル化、および計算方法を体系的に調査することができます。
フクロウを使用して、神経スケーリング法則を導き出し、スケーリング時に最終的なパフォーマンスを確実に予測する方法を示します。
私たちの重要な調査結果の1つは、スケーリングが低リソース言語/方言のパフォーマンスを向上させ、バイアスを軽減し、音声技術のアクセシビリティを改善するのに役立つことです。
最後に、大規模な音声モデルで緊急能力を発見することにより、フクロウを使用して新しい研究の方向性を促進する方法を示します。
モデルチェックポイントは、https：//huggingface.co/collections/espnet/owls-scaling-laws-for-sepeech-agecognition-and-translation-67AB7F991C194065F057CE8Dで将来の研究のためにリリースされます。

要約(オリジナル)

Neural scaling laws offer valuable insights for designing robust sequence processing architectures. While these laws have been extensively characterized in other modalities, their behavior in speech remains comparatively underexplored. In this work, we introduce OWLS, an open-access, reproducible suite of multilingual speech recognition and translation models spanning 0.25B to 18B parameters, with the 18B version being the largest speech model, to the best of our knowledge. OWLS leverages up to 360K hours of public speech data across 150 languages, enabling a systematic investigation into how data, model, and compute scaling each influence performance in multilingual speech tasks. We use OWLS to derive neural scaling laws, showing how final performance can be reliably predicted when scaling. One of our key findings is that scaling enhances performance on low-resource languages/dialects, helping to mitigate bias and improve the accessibility of speech technologies. Finally, we show how OWLS can be used to power new research directions by discovering emergent abilities in large-scale speech models. Model checkpoints will be released on https://huggingface.co/collections/espnet/owls-scaling-laws-for-speech-recognition-and-translation-67ab7f991c194065f057ce8d for future studies.

arxiv情報

著者	William Chen,Jinchuan Tian,Yifan Peng,Brian Yan,Chao-Han Huck Yang,Shinji Watanabe
発行日	2025-02-14 18:51:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー