Adapting the adapters for code-switching in multilingual ASR

要約

最近、事前トレーニングされた大規模な多言語音声モデルにより、自動音声認識 (ASR) を多くの低リソース言語に拡張できる可能性が示されました。
これらのモデルの一部は、定式化に言語アダプターを採用しています。これにより、単言語のパフォーマンスが向上し、リソースが豊富な言語での多言語モデリングの欠点の一部が回避されます。
ただし、この定式化では、同じ発話内で 2 つの言語が混在するコード交換音声でのこれらのモデルの使用可能性が制限されます。
この研究では、ネットワーク内の各言語適応ポイントで両方の言語アダプタからの情報を同化することにより、コード交換音声に関するそのようなモデルを効果的に微調整する方法を提案します。
また、フレームレベルで各言語アダプターからの情報の流れをガイドするために使用できる、コードスイッチングを一連の潜在的なバイナリシーケンスとしてモデル化します。
提案されたアプローチは、英語と組み合わせたアラビア語、北京語、ヒンディー語を含む 3 つのコードスイッチングデータセットで評価され、すべてのテストセットで CER が少なくとも 10\% 絶対的に削減され、コードスイッチングのパフォーマンスが一貫して向上していることが示されています。

要約(オリジナル)

Recently, large pre-trained multilingual speech models have shown potential in scaling Automatic Speech Recognition (ASR) to many low-resource languages. Some of these models employ language adapters in their formulation, which helps to improve monolingual performance and avoids some of the drawbacks of multi-lingual modeling on resource-rich languages. However, this formulation restricts the usability of these models on code-switched speech, where two languages are mixed together in the same utterance. In this work, we propose ways to effectively fine-tune such models on code-switched speech, by assimilating information from both language adapters at each language adaptation point in the network. We also model code-switching as a sequence of latent binary sequences that can be used to guide the flow of information from each language adapter at the frame level. The proposed approaches are evaluated on three code-switched datasets encompassing Arabic, Mandarin, and Hindi languages paired with English, showing consistent improvements in code-switching performance with at least 10\% absolute reduction in CER across all test sets.

arxiv情報

著者	Atharva Kulkarni,Ajinkya Kulkarni,Miguel Couceiro,Hanan Aldarmaki
発行日	2023-10-11 12:15:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Adapting the adapters for code-switching in multilingual ASR

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー