Federated Representation Learning for Automatic Speech Recognition

要約

Federated Learning (FL) はプライバシーを保護するパラダイムであり、エッジデバイスがデータを共有せずに共同学習できるようにします。
Alexa や Siri などのエッジデバイスは、堅牢な音声表現を学習するためにタップできる、ラベルのない音声データの将来的なソースとなります。
この研究では、自己教師あり学習 (SSL) と FL を組み合わせて、データプライバシーの制約を考慮した自動音声認識の表現を学習します。
ラベルなし音声データセット Libri-Light の話者とチャプタの情報を使用して、非 IID 話者サイロ化データ分布をシミュレートし、FedSGD の対照予測コーディングフレームワークで LSTM エンコーダを事前トレーニングします。
FL の事前トレーニングされた ASR エンコーダーは、中央で事前トレーニングされたモデルと同様に機能し、事前トレーニングなしと比較して 12 ～ 15% (WER) の改善が見られることを示します。
さらに、フェデレーションの事前トレーニング済みモデルを新しい言語であるフランス語に適応させたところ、事前トレーニングなしと比較して 20% (WER) の改善が見られました。

要約(オリジナル)

Federated Learning (FL) is a privacy-preserving paradigm, allowing edge devices to learn collaboratively without sharing data. Edge devices like Alexa and Siri are prospective sources of unlabeled audio data that can be tapped to learn robust audio representations. In this work, we bring Self-supervised Learning (SSL) and FL together to learn representations for Automatic Speech Recognition respecting data privacy constraints. We use the speaker and chapter information in the unlabeled speech dataset, Libri-Light, to simulate non-IID speaker-siloed data distributions and pre-train an LSTM encoder with the Contrastive Predictive Coding framework with FedSGD. We show that the pre-trained ASR encoder in FL performs as well as a centrally pre-trained model and produces an improvement of 12-15% (WER) compared to no pre-training. We further adapt the federated pre-trained models to a new language, French, and show a 20% (WER) improvement over no pre-training.

arxiv情報

著者	Guruprasad V Ramesh,Gopinath Chennupati,Milind Rao,Anit Kumar Sahu,Ariya Rastrow,Jasha Droppo
発行日	2023-08-07 21:34:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Federated Representation Learning for Automatic Speech Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー