Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages

要約

コネクショニスト時間分類 (CTC) モデルは、自動音声認識 (ASR) の速度とパフォーマンスのバランスが取れていることで人気があります。
ただし、これらの CTC モデルは、カスタム単語に対するパーソナライゼーションなど、他の領域では依然として苦戦しています。
最近のアプローチではコンテキストアダプターが検討されており、CTC の注意ベースのバイアスモデルを使用してカスタムエンティティの認識が向上しています。
このアプローチは十分なデータに対してはうまく機能しますが、リソースの少ない言語に対しては効果的な戦略ではないことを示します。
この研究では、コンテキストアダプターのトレーニングをよりスムーズにするための監視損失を提案します。
さらに、限られたトレーニングデータでパフォーマンスを向上させるための多言語戦略を検討します。
私たちの方法では、低リソース言語の未表示のカスタムエンティティを取得する際に F1 で 48% の改善を達成しました。
興味深いことに、コンテキストアダプターのトレーニングの副産物として、基本 CTC モデルのパフォーマンスでもワードエラーレート (WER) が 5 ～ 11% 減少していることがわかります。

要約(オリジナル)

Connectionist Temporal Classification (CTC) models are popular for their balance between speed and performance for Automatic Speech Recognition (ASR). However, these CTC models still struggle in other areas, such as personalization towards custom words. A recent approach explores Contextual Adapters, wherein an attention-based biasing model for CTC is used to improve the recognition of custom entities. While this approach works well with enough data, we showcase that it isn’t an effective strategy for low-resource languages. In this work, we propose a supervision loss for smoother training of the Contextual Adapters. Further, we explore a multilingual strategy to improve performance with limited training data. Our method achieves 48% F1 improvement in retrieving unseen custom entities for a low-resource language. Interestingly, as a by-product of training the Contextual Adapters, we see a 5-11% Word Error Rate (WER) reduction in the performance of the base CTC model as well.

arxiv情報

著者	Devang Kulshreshtha,Saket Dingliwal,Brady Houston,Sravan Bodapati
発行日	2023-07-03 05:29:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー