LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

要約

自己教師あり学習 (SSL) は、コンピュータービジョンや自然言語処理など、さまざまな分野で前例のない改善の源となります。
現在のドメイン関連タスクのほとんどが事前トレーニングされたモデルを使用してアプローチされているため、音声処理は SSL の恩恵を大きく受けています。
この研究では、SSL を備えたフランス語音声テクノロジーを評価および構築するためのオープンソースフレームワークである LeBenchmark 2.0 を紹介します。
これには、最大 14,000 時間の異種音声を含む文書化された大規模な異種コーパス、コミュニティと共有される 2,600 万から 10 億の学習可能なパラメータを含む 10 個の事前トレーニング済み SSL wav2vec 2.0 モデル、および 6 つの下流タスクで構成される評価プロトコルが含まれます。
既存のベンチマークを補完します。
LeBenchmark 2.0 はまた、凍結されたダウンストリームモデルと微調整されたダウンストリームモデル、タスクに依存しない事前トレーニング済みモデルとタスク固有の事前トレーニング済みモデルの調査、および大規模なシステムの二酸化炭素排出量に関する議論により、音声用の事前トレーニング済み SSL モデルに関する独自の視点を示しています。
モデルトレーニング。
全体として、14,000 時間のフランス語音声でトレーニングされた新しく導入されたモデルは、ベンチマーク全体で多言語モデルや以前の LeBenchmark SSL モデルを上回っていますが、事前トレーニングには最大 4 倍のエネルギーも必要でした。

要約(オリジナル)

Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-equipped French speech technologies. It includes documented, large-scale and heterogeneous corpora with up to 14,000 hours of heterogeneous speech, ten pre-trained SSL wav2vec 2.0 models containing from 26 million to one billion learnable parameters shared with the community, and an evaluation protocol made of six downstream tasks to complement existing benchmarks. LeBenchmark 2.0 also presents unique perspectives on pre-trained SSL models for speech with the investigation of frozen versus fine-tuned downstream models, task-agnostic versus task-specific pre-trained models as well as a discussion on the carbon footprint of large-scale model training. Overall, the newly introduced models trained on 14,000 hours of French speech outperform multilingual and previous LeBenchmark SSL models across the benchmark but also required up to four times more energy for pre-training.

arxiv情報

著者	Titouan Parcollet,Ha Nguyen,Solene Evain,Marcely Zanon Boito,Adrien Pupier,Salima Mdhaffar,Hang Le,Sina Alisamir,Natalia Tomashenko,Marco Dinarelli,Shucong Zhang,Alexandre Allauzen,Maximin Coavoux,Yannick Esteve,Mickael Rouvier,Jerome Goulian,Benjamin Lecouteux,Francois Portet,Solange Rossato,Fabien Ringeval,Didier Schwab,Laurent Besacier
発行日	2024-03-18 10:54:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー