FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning

要約

継続的な事前トレーニング (CP) には、ターゲットドメインへの適応や、オンラインで入手可能なラベルなしデータの継続的なストリームを活用できる可能性など、複数の利点があります。
ただし、ドメイン外のディストリビューションで事前トレーニングを続けると、以前に取得した知識が壊滅的に忘れられてしまい、ASR パフォーマンスが最適以下になることがよくあります。
このペーパーでは、SSL ベースの継続的な事前トレーニングのためのシンプルで新しい方法論である FusDom を紹介します。
FusDom は、堅牢かつ適応的でありながら、過去に見られた概念を忘れない音声表現を学習します。
FusDom は、単一モデルの出力表現で SSL プレテキストタスクを解決するのではなく、2 つの同一の事前トレーニング済み SSL モデル (教師と生徒) を利用し、変更された事前トレーニングヘッドを使用して CP SSL プレテキストタスクを解決します。
。
このヘッドは、両方のモデルの表現間でクロスアテンションメカニズムを採用していますが、生徒だけが勾配の更新を受け取り、教師は受け取りません。
最後に、学生は ASR に合わせて微調整されます。
実際に、FusDom はすべての設定ですべてのベースラインを大幅に上回り、以前のドメインでのパフォーマンスを維持しながら、ターゲットドメインで 0.2 WER ～ 7.3 WER の範囲で WER が向上しました。

要約(オリジナル)

Continued pre-training (CP) offers multiple advantages, like target domain adaptation and the potential to exploit the continuous stream of unlabeled data available online. However, continued pre-training on out-of-domain distributions often leads to catastrophic forgetting of previously acquired knowledge, leading to sub-optimal ASR performance. This paper presents FusDom, a simple and novel methodology for SSL-based continued pre-training. FusDom learns speech representations that are robust and adaptive yet not forgetful of concepts seen in the past. Instead of solving the SSL pre-text task on the output representations of a single model, FusDom leverages two identical pre-trained SSL models, a teacher and a student, with a modified pre-training head to solve the CP SSL pre-text task. This head employs a cross-attention mechanism between the representations of both models while only the student receives gradient updates and the teacher does not. Finally, the student is fine-tuned for ASR. In practice, FusDom outperforms all our baselines across settings significantly, with WER improvements in the range of 0.2 WER – 7.3 WER in the target domain while retaining the performance in the earlier domain.

arxiv情報

著者	Ashish Seth,Sreyan Ghosh,S. Umesh,Dinesh Manocha
発行日	2023-12-20 13:50:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー