Pseudo-Labeling for Domain-Agnostic Bangla Automatic Speech Recognition

要約

低リソース言語の自動音声認識 (ASR) を開発する際の大きな課題の 1 つは、ドメイン固有のバリエーションを持つラベル付きデータへのアクセスが制限されていることです。
この研究では、大規模なドメインに依存しない ASR データセットを開発するための擬似ラベル付けアプローチを提案します。
提案された方法論を使用して、さまざまなトピック、話し方、方言、騒がしい環境、会話シナリオをカバーする 20,000 時間以上のラベル付きバングラ語音声データセットを開発しました。
次に、開発したコーパスを利用して、配座異性体ベースの ASR システムを設計しました。
トレーニングされた ASR を公開されているデータセットでベンチマークし、他の利用可能なモデルと比較しました。
有効性を調査するために、ニュース、電話、会話データなどから構成される、人間による注釈が付けられたドメインに依存しないテストセットを設計および開発しました。
私たちの結果は、設計されたテストセットの擬似ラベルデータと公的に入手可能なバングラデータセットでトレーニングされたモデルの有効性を示しています。
実験リソースは一般に公開されます。(https://github.com/hishab-nlp/Pseudo-Labeling-for-Domain-Agnostic-Bangla-ASR)

要約(オリジナル)

One of the major challenges for developing automatic speech recognition (ASR) for low-resource languages is the limited access to labeled data with domain-specific variations. In this study, we propose a pseudo-labeling approach to develop a large-scale domain-agnostic ASR dataset. With the proposed methodology, we developed a 20k+ hours labeled Bangla speech dataset covering diverse topics, speaking styles, dialects, noisy environments, and conversational scenarios. We then exploited the developed corpus to design a conformer-based ASR system. We benchmarked the trained ASR with publicly available datasets and compared it with other available models. To investigate the efficacy, we designed and developed a human-annotated domain-agnostic test set composed of news, telephony, and conversational data among others. Our results demonstrate the efficacy of the model trained on psuedo-label data for the designed test-set along with publicly-available Bangla datasets. The experimental resources will be publicly available.(https://github.com/hishab-nlp/Pseudo-Labeling-for-Domain-Agnostic-Bangla-ASR)

arxiv情報

著者	Rabindra Nath Nandi,Mehadi Hasan Menon,Tareq Al Muntasir,Sagor Sarker,Quazi Sarwar Muhtaseem,Md. Tariqul Islam,Shammur Absar Chowdhury,Firoj Alam
発行日	2023-11-06 15:37:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Pseudo-Labeling for Domain-Agnostic Bangla Automatic Speech Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー