Three-in-One: Fast and Accurate Transducer for Hybrid-Autoregressive ASR

要約

トークンアンドデュレーショントランスデューサー（TDT）モデルを拡張する音声認識のための新しいアーキテクチャであるハイブリッドオートレーフの推論トランスデューサー（HAINAN）を提示します。
ランダムにマスクされた予測因子ネットワーク出力でトレーニングされたHainanは、すべてのネットワークコンポーネントとの自己回帰推論と、予測因子なしでは非自動性推論の両方をサポートしています。
さらに、最初に非自動性推論を使用して初期仮説を生成する新しい半自動性推論パラダイムを提案し、その後、各トークン予測が初期仮説の並列化された自己回復を使用して再生される改良ステップが続きます。
異なる言語の複数のデータセットでの実験は、海南が非自動網性モードでCTC、および自己回帰モードでTDTを使用して効率性パリティを達成することを示しています。
精度の点では、自己回帰の海南はTDTとRNN-Tを上回りますが、非自動性格納はCTCを大幅に上回ります。
半自動格付けの推論は、最小限の計算オーバーヘッドでモデルの精度をさらに強化し、場合によってはTDTの結果を上回ることさえあります。
これらの結果は、正確性と速度のバランスをとる上の柔軟性を強調し、それを現実世界の音声認識アプリケーションの強力な候補として位置づけています。

要約(オリジナル)

We present Hybrid-Autoregressive INference TrANsducers (HAINAN), a novel architecture for speech recognition that extends the Token-and-Duration Transducer (TDT) model. Trained with randomly masked predictor network outputs, HAINAN supports both autoregressive inference with all network components and non-autoregressive inference without the predictor. Additionally, we propose a novel semi-autoregressive inference paradigm that first generates an initial hypothesis using non-autoregressive inference, followed by refinement steps where each token prediction is regenerated using parallelized autoregression on the initial hypothesis. Experiments on multiple datasets across different languages demonstrate that HAINAN achieves efficiency parity with CTC in non-autoregressive mode and with TDT in autoregressive mode. In terms of accuracy, autoregressive HAINAN outperforms TDT and RNN-T, while non-autoregressive HAINAN significantly outperforms CTC. Semi-autoregressive inference further enhances the model’s accuracy with minimal computational overhead, and even outperforms TDT results in some cases. These results highlight HAINAN’s flexibility in balancing accuracy and speed, positioning it as a strong candidate for real-world speech recognition applications.

arxiv情報

著者	Hainan Xu,Travis M. Bartley,Vladimir Bataev,Boris Ginsburg
発行日	2025-02-24 18:15:09+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Three-in-One: Fast and Accurate Transducer for Hybrid-Autoregressive ASR

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー