Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy

要約

すべてのターゲットトークンを並行して予測するため、非自己回帰モデルは、従来の自己回帰モデルと比較して、音声認識のデコード効率を大幅に向上させます。
この作業では、動的アライメントマスク CTC を提示し、次の 2 つの方法を紹介します。(1) 動的プログラミングによってクロスエントロピーの損失を最小限に抑える単調なアライメントを見つける AXE (Aligned Cross Entropy)、(2) 動的整流、新しいトレーニングサンプルを作成する
一部のマスクをモデル予測トークンに置き換えることによって。
AX は、予測とグラウンドトゥルースセンテンスの間の絶対的な位置合わせを無視し、相対的な順序で一致するトークンに焦点を当てます。
動的修正法により、モデルは、信頼度が高い場合でも、マスクではなく間違った可能性のあるトークンをシミュレートできます。
WSJ データセットでの実験では、AX 損失だけでなく整流方法も Mask CTC の WER パフォーマンスを改善できることが示されました。

要約(オリジナル)

Because of predicting all the target tokens in parallel, the non-autoregressive models greatly improve the decoding efficiency of speech recognition compared with traditional autoregressive models. In this work, we present dynamic alignment Mask CTC, introducing two methods: (1) Aligned Cross Entropy (AXE), finding the monotonic alignment that minimizes the cross-entropy loss through dynamic programming, (2) Dynamic Rectification, creating new training samples by replacing some masks with model predicted tokens. The AXE ignores the absolute position alignment between prediction and ground truth sentence and focuses on tokens matching in relative order. The dynamic rectification method makes the model capable of simulating the non-mask but possible wrong tokens, even if they have high confidence. Our experiments on WSJ dataset demonstrated that not only AXE loss but also the rectification method could improve the WER performance of Mask CTC.

arxiv情報

著者	Xulong Zhang,Haobin Tang,Jianzong Wang,Ning Cheng,Jian Luo,Jing Xiao
発行日	2023-03-14 08:01:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー