Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network

要約

コンテキスト情報は音声認識技術において重要な役割を果たしており、それをエンドツーエンドの音声認識モデルに組み込むことが最近非常に関心を集めています。
ただし、以前のディープバイアス手法には、バイアスタスクに対する明示的な監視が不足していました。
この研究では、注意ベースのディープバイアス手法のための文脈フレーズ予測ネットワークを紹介します。
このネットワークは、コンテキスト埋め込みを使用して発話内のコンテキストフレーズを予測し、コンテキスト化されたモデルのトレーニングを支援するバイアス損失を計算します。
私たちの方法は、さまざまなエンドツーエンドの音声認識モデルにわたって単語誤り率 (WER) の大幅な削減を達成しました。
LibriSpeech コーパスの実験では、提案されたモデルがベースラインモデルと比較して WER が相対的に 12.1% 向上し、文脈フレーズの WER が相対的に 40.5% 減少することが示されています。
さらに、コンテキストフレーズフィルタリング戦略を適用することにより、より大きなバイアスリストを使用する場合の WER の低下も効果的に排除します。

要約(オリジナル)

Contextual information plays a crucial role in speech recognition technologies and incorporating it into the end-to-end speech recognition models has drawn immense interest recently. However, previous deep bias methods lacked explicit supervision for bias tasks. In this study, we introduce a contextual phrase prediction network for an attention-based deep bias method. This network predicts context phrases in utterances using contextual embeddings and calculates bias loss to assist in the training of the contextualized model. Our method achieved a significant word error rate (WER) reduction across various end-to-end speech recognition models. Experiments on the LibriSpeech corpus show that our proposed model obtains a 12.1% relative WER improvement over the baseline model, and the WER of the context phrases decreases relatively by 40.5%. Moreover, by applying a context phrase filtering strategy, we also effectively eliminate the WER degradation when using a larger biasing list.

arxiv情報

著者	Kaixun Huang,Ao Zhang,Zhanheng Yang,Pengcheng Guo,Bingshen Mu,Tianyi Xu,Lei Xie
発行日	2023-06-26 12:28:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー