Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition

要約

既存の SSL モデルをターゲットドメインに適応させるための継続的な自己教師あり (SSL) 事前トレーニングは、低リソースの自動音声認識 (ASR) に非常に効果的であることが示されています。
この論文では、ラベル付きデータとラベルなしデータの両方が制限されているターゲットドメインでの ASR パフォーマンスを向上させる、SSL ベースの継続的な事前トレーニングのためのシンプルで新しいアプローチである安定した蒸留を提案します。
安定蒸留では、継続的な事前トレーニングの正則化として自己蒸留を採用し、ソースドメインとターゲットドメインが異なる場合に継続的な事前トレーニングが直面する一般的な問題であるオーバーフィッティングの問題を軽減します。
具体的には、まず、ターゲットドメイン ASR データセット上の初期 SSL 事前トレーニングモデルに対してバニラの継続事前トレーニングを実行し、それを教師と呼びます。
次に、生徒と同じ最初の事前トレーニング済みモデルを使用して、その隠蔽表現が教師の表現に近づくように強制しながら、継続的な事前トレーニングを実行します (MSE 損失により)。
このスチューデントは、ターゲットデータセットでの下流の ASR 微調整に使用されます。
実際、Stable Distillation は、さまざまな実験設定で評価した場合、すべてのベースラインを 0.8 ～ 7 WER 上回っています。

要約(オリジナル)

Continued self-supervised (SSL) pre-training for adapting existing SSL models to the target domain has shown to be extremely effective for low-resource Automatic Speech Recognition (ASR). This paper proposes Stable Distillation, a simple and novel approach for SSL-based continued pre-training that boosts ASR performance in the target domain where both labeled and unlabeled data are limited. Stable Distillation employs self-distillation as regularization for continued pre-training, alleviating the over-fitting issue, a common problem continued pre-training faces when the source and target domains differ. Specifically, first, we perform vanilla continued pre-training on an initial SSL pre-trained model on the target domain ASR dataset and call it the teacher. Next, we take the same initial pre-trained model as a student to perform continued pre-training while enforcing its hidden representations to be close to that of the teacher (via MSE loss). This student is then used for downstream ASR fine-tuning on the target dataset. In practice, Stable Distillation outperforms all our baselines by 0.8 – 7 WER when evaluated in various experimental settings.

arxiv情報

著者	Ashish Seth,Sreyan Ghosh,S. Umesh,Dinesh Manocha
発行日	2023-12-20 06:02:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー