Memory-augmented conformer for improved end-to-end long-form ASR

要約

コンフォーマーは最近、自動音声認識 (ASR) の有望なモデリングアプローチとして提案されており、リカレントニューラルネットワークベースのアプローチやトランスフォーマーよりも優れています。
それにもかかわらず、一般に、これらのエンドツーエンドモデル、特に注意ベースのモデルのパフォーマンスは、長い発話の場合に特に低下します。
この制限に対処するために、配座異性体のエンコーダーとデコーダーの間に完全微分可能なメモリ拡張ニューラルネットワークを追加することを提案します。
この外部メモリにより、システムはより多くの情報を繰り返し保存および取得できるため、より長い発話の一般化を強化できます。
特に、ASR 用に提案した Conformer-NTM モデルアーキテクチャにつながるニューラルチューリングマシン (NTM) を調査します。
Librispeech train-clean-100 セットと train-960 セットを使用した実験結果は、提案されたシステムが、長い発話に対してメモリなしのベースラインコンフォーマーよりも優れていることを示しています。

要約(オリジナル)

Conformers have recently been proposed as a promising modelling approach for automatic speech recognition (ASR), outperforming recurrent neural network-based approaches and transformers. Nevertheless, in general, the performance of these end-to-end models, especially attention-based models, is particularly degraded in the case of long utterances. To address this limitation, we propose adding a fully-differentiable memory-augmented neural network between the encoder and decoder of a conformer. This external memory can enrich the generalization for longer utterances since it allows the system to store and retrieve more information recurrently. Notably, we explore the neural Turing machine (NTM) that results in our proposed Conformer-NTM model architecture for ASR. Experimental results using Librispeech train-clean-100 and train-960 sets show that the proposed system outperforms the baseline conformer without memory for long utterances.

arxiv情報

著者	Carlos Carvalho,Alberto Abad
発行日	2023-09-22 17:44:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Memory-augmented conformer for improved end-to-end long-form ASR

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー