Language Bias in Self-Supervised Learning For Automatic Speech Recognition

要約

自己学習学習（SSL）は、データの高価なラベル付けを必要とせずに、大きなデータセットでトレーニングするために深い学習に使用されます。
最近、XLS-Rなどの大規模な自動音声認識（ASR）モデルがSSLを利用して、100を超える言語で同時にトレーニングしています。
ただし、より深い調査により、XLS-Rのトレーニングデータの大部分は少数の言語から来ていることが示されています。
SSLを介して学んだバイアスは複数のドメインに存在することが示されていますが、多言語SSL ASRにおける言語バイアスは徹底的に調べられていません。
このホワイトペーパーでは、宝くじのチケット仮説（LTH）を利用して、XLS-R内の言語固有のサブネットワークを特定し、これらのサブネットワークのパフォーマンスをさまざまな言語でテストします。
微調整すると、XLS-Rは従来の言語知識をバイパスし、前orasingデータに最大のデータ貢献度を持つ言語から学んだ重みにのみ構築することを示すことができます。

要約(オリジナル)

Self-supervised learning (SSL) is used in deep learning to train on large datasets without the need for expensive labelling of the data. Recently, large Automatic Speech Recognition (ASR) models such as XLS-R have utilised SSL to train on over one hundred different languages simultaneously. However, deeper investigation shows that the bulk of the training data for XLS-R comes from a small number of languages. Biases learned through SSL have been shown to exist in multiple domains, but language bias in multilingual SSL ASR has not been thoroughly examined. In this paper, we utilise the Lottery Ticket Hypothesis (LTH) to identify language-specific subnetworks within XLS-R and test the performance of these subnetworks on a variety of different languages. We are able to show that when fine-tuning, XLS-R bypasses traditional linguistic knowledge and builds only on weights learned from the languages with the largest data contribution to the pretraining data.

arxiv情報

著者	Edward Storey,Naomi Harte,Peter Bell
発行日	2025-01-31 17:16:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Language Bias in Self-Supervised Learning For Automatic Speech Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー