Streaming Bilingual End-to-End ASR model using Attention over Multiple Softmax

要約

多言語モデリングがいくつか進歩したとしても、入力言語を知らずに単一のニューラルモデルを使用して複数の言語を認識することは困難であり、ほとんどの多言語モデルは入力言語が利用できることを前提としています。
この研究では、単一のニューラルモデルが両方の言語を認識でき、ユーザーからの言語入力なしで言語間の切り替えもサポートできる、新しいバイリンガルエンドツーエンド (E2E) モデリングアプローチを提案します。
提案されたモデルには、セルフアテンションメカニズムを介して結合された言語固有の共同ネットワークを備えた共有エンコーダーと予測ネットワークが含まれています。
言語固有の事後確率が結合されると、すべての出力シンボルに対して単一の事後確率が生成され、単一のビーム検索デコードが可能になり、言語間の動的な切り替えも可能になります。
提案されたアプローチは、ヒンディー語、英語、およびコード混合テストセットでそれぞれ 13.3%、8.23%、および 1.3% の単語誤り率の相対的な減少により、従来のバイリンガルのベースラインを上回ります。

要約(オリジナル)

Even with several advancements in multilingual modeling, it is challenging to recognize multiple languages using a single neural model, without knowing the input language and most multilingual models assume the availability of the input language. In this work, we propose a novel bilingual end-to-end (E2E) modeling approach, where a single neural model can recognize both languages and also support switching between the languages, without any language input from the user. The proposed model has shared encoder and prediction networks, with language-specific joint networks that are combined via a self-attention mechanism. As the language-specific posteriors are combined, it produces a single posterior probability over all the output symbols, enabling a single beam search decoding and also allowing dynamic switching between the languages. The proposed approach outperforms the conventional bilingual baseline with 13.3%, 8.23% and 1.3% word error rate relative reduction on Hindi, English and code-mixed test sets, respectively.

arxiv情報

著者	Aditya Patil,Vikas Joshi,Purvi Agrawal,Rupesh Mehta
発行日	2024-01-22 01:44:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Streaming Bilingual End-to-End ASR model using Attention over Multiple Softmax

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー