Lattice-Free Sequence Discriminative Training for Phoneme-Based Neural Transducers

要約

最近、RNN トランスデューサーはさまざまな自動音声認識タスクで目覚ましい成果を上げています。
ただし、ハイブリッドモデルで優れたパフォーマンスが得られる格子フリーシーケンスの識別トレーニング方法は、RNN トランスデューサーではほとんど研究されていません。
この研究では、音素の最終事後出力に使用される 3 つのラティスフリートレーニング目標、つまりラティスフリーの最大相互情報量、ラティスフリーのセグメントレベルの最小ベイズリスク、およびラティスフリーの最小ベイズリスクを提案します。
コンテキスト依存性が限定された、ベースの神経トランスデューサー。
N ベストリストを使用する基準と比較して、ラティスフリー手法ではトレーニング中の仮説生成のデコードステップが不要になるため、より効率的なトレーニングが可能になります。
実験結果は、格子フリー手法では、シーケンスレベルのクロスエントロピー学習済みモデルと比較して単語誤り率が最大 6.5% 相対的に改善されることを示しています。
N ベストリストに基づく最小ベイズリスク目標と比較して、格子なし手法では、パフォーマンスがわずかに低下するものの、相対的にトレーニング時間が 40% ～ 70% 高速化されます。

要約(オリジナル)

Recently, RNN-Transducers have achieved remarkable results on various automatic speech recognition tasks. However, lattice-free sequence discriminative training methods, which obtain superior performance in hybrid models, are rarely investigated in RNN-Transducers. In this work, we propose three lattice-free training objectives, namely lattice-free maximum mutual information, lattice-free segment-level minimum Bayes risk, and lattice-free minimum Bayes risk, which are used for the final posterior output of the phoneme-based neural transducer with a limited context dependency. Compared to criteria using N-best lists, lattice-free methods eliminate the decoding step for hypotheses generation during training, which leads to more efficient training. Experimental results show that lattice-free methods gain up to 6.5% relative improvement in word error rate compared to a sequence-level cross-entropy trained model. Compared to the N-best-list based minimum Bayes risk objectives, lattice-free methods gain 40% – 70% relative training time speedup with a small degradation in performance.

arxiv情報

著者	Zijian Yang,Wei Zhou,Ralf Schlüter,Hermann Ney
発行日	2023-05-25 15:54:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Lattice-Free Sequence Discriminative Training for Phoneme-Based Neural Transducers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー