BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder

要約

BERT-CTC-Transducer (BECTRA) は、BERT で強化されたエンコーダーを備えたトランスデューサーによって定式化された新しいエンドツーエンドの自動音声認識 (E2E-ASR) モデルです。
大規模な事前トレーニング済み言語モデル (LM) を E2E-ASR に統合することが活発に研究されており、正確なテキストを生成するために多目的な言語知識を利用することを目的としています。
この統合を困難にする重要な要因の 1 つは、語彙の不一致にあります。
事前トレーニング済み LM 用に構築された語彙は、一般に E2E-ASR トレーニングには大きすぎ、ターゲット ASR ドメインに対して不一致になる可能性があります。
このような問題を克服するために、対象語彙を使用してBERTベースのE2E-ASRを実現する、以前のBERT-CTCの拡張バージョンであるBECTRAを提案します。
BECTRA は変換器ベースのモデルで、エンコーダーに BERT-CTC を採用し、ターゲットタスクに適した語彙を使用して ASR 固有のデコーダーをトレーニングします。
トランスデューサとBERT-CTCの組み合わせにより、自己回帰および非自己回帰デコードの両方を利用するための新しい推論アルゴリズムも提案します。
データ量、話し方、言語が異なるいくつかの ASR タスクに関する実験結果は、BERT の知識を活用しながら語彙の不一致を効果的に処理することで、BECTRA が BERT-CTC よりも優れていることを示しています。

要約(オリジナル)

We present BERT-CTC-Transducer (BECTRA), a novel end-to-end automatic speech recognition (E2E-ASR) model formulated by the transducer with a BERT-enhanced encoder. Integrating a large-scale pre-trained language model (LM) into E2E-ASR has been actively studied, aiming to utilize versatile linguistic knowledge for generating accurate text. One crucial factor that makes this integration challenging lies in the vocabulary mismatch; the vocabulary constructed for a pre-trained LM is generally too large for E2E-ASR training and is likely to have a mismatch against a target ASR domain. To overcome such an issue, we propose BECTRA, an extended version of our previous BERT-CTC, that realizes BERT-based E2E-ASR using a vocabulary of interest. BECTRA is a transducer-based model, which adopts BERT-CTC for its encoder and trains an ASR-specific decoder using a vocabulary suitable for a target task. With the combination of the transducer and BERT-CTC, we also propose a novel inference algorithm for taking advantage of both autoregressive and non-autoregressive decoding. Experimental results on several ASR tasks, varying in amounts of data, speaking styles, and languages, demonstrate that BECTRA outperforms BERT-CTC by effectively dealing with the vocabulary mismatch while exploiting BERT knowledge.

arxiv情報

著者	Yosuke Higuchi,Tetsuji Ogawa,Tetsunori Kobayashi,Shinji Watanabe
発行日	2023-03-17 01:52:50+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー