Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator

要約

転写された音声データ、テキストデータ、または両方の混合でトレーニングできるエンドツーエンドの ASR システムを提案します。
テキストのみのトレーニングの場合、拡張 ASR モデルは、テキストからメルスペクトログラムを作成する統合補助 TTS ブロックを使用します。
このブロックには、スペクトログラムの品質を向上させるために GAN エンハンサーで強化された、従来の非自己回帰テキストからメルへのスペクトログラムジェネレーターが含まれています。
提案されたシステムは、テキストのみのデータを使用することにより、新しいドメインでの ASR モデルの精度を向上させることができ、大規模なテキストコーパスを使用することにより、従来の音声テキストトレーニングを大幅に上回ることができます。

要約(オリジナル)

We propose an end-to-end ASR system that can be trained on transcribed speech data, text data, or a mixture of both. For text-only training, our extended ASR model uses an integrated auxiliary TTS block that creates mel spectrograms from the text. This block contains a conventional non-autoregressive text-to-mel-spectrogram generator augmented with a GAN enhancer to improve the spectrogram quality. The proposed system can improve the accuracy of the ASR model on a new domain by using text-only data, and allows to significantly surpass conventional audio-text training by using large text corpora.

arxiv情報

著者	Vladimir Bataev,Roman Korostik,Evgeny Shabalin,Vitaly Lavrukhin,Boris Ginsburg
発行日	2023-02-27 18:47:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー