Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization

要約

自動音声認識 (ASR) は、最近、深層学習 (DL) を使用する際の重要な課題となっています。
これには、大規模なトレーニングデータセットと、大量の計算リソースとストレージリソースが必要です。
さらに、DL 技術と機械学習 (ML) アプローチは一般に、トレーニングデータとテストデータが同じドメインから取得され、同じ入力特徴空間とデータ分布特性を持つと仮定します。
ただし、この仮定は、一部の現実世界の人工知能 (AI) アプリケーションには適用できません。
さらに、実際のデータの収集が困難、高価、またはめったに発生しないため、DL モデルのデータ要件を満たせない状況もあります。
これらの問題を克服するために深層転移学習 (DTL) が導入されました。これは、トレーニングデータに関連する、小さいかわずかに異なる実際のデータセットを使用して、高パフォーマンスのモデルを開発するのに役立ちます。
このペーパーでは、DTL ベースの ASR フレームワークの包括的な調査を示し、最新の開発に光を当て、学者や専門家が現在の課題を理解するのに役立ちます。
具体的には、DTL の背景を提示した後、最先端の情報を伝えるためによく設計された分類法が採用されています。
次に、各フレームワークの制限と利点を特定するために重要な分析が実行されます。
次に、将来の研究の機会を導き出す前に、現在の課題を強調するために比較研究が導入されます。

要約(オリジナル)

Automatic speech recognition (ASR) has recently become an important challenge when using deep learning (DL). It requires large-scale training datasets and high computational and storage resources. Moreover, DL techniques and machine learning (ML) approaches in general, hypothesize that training and testing data come from the same domain, with the same input feature space and data distribution characteristics. This assumption, however, is not applicable in some real-world artificial intelligence (AI) applications. Moreover, there are situations where gathering real data is challenging, expensive, or rarely occurring, which can not meet the data requirements of DL models. deep transfer learning (DTL) has been introduced to overcome these issues, which helps develop high-performing models using real datasets that are small or slightly different but related to the training data. This paper presents a comprehensive survey of DTL-based ASR frameworks to shed light on the latest developments and helps academics and professionals understand current challenges. Specifically, after presenting the DTL background, a well-designed taxonomy is adopted to inform the state-of-the-art. A critical analysis is then conducted to identify the limitations and advantages of each framework. Moving on, a comparative study is introduced to highlight the current challenges before deriving opportunities for future research.

arxiv情報

著者	Hamza Kheddar,Yassine Himeur,Somaya Al-Maadeed,Abbes Amira,Faycal Bensaali
発行日	2023-07-31 11:58:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー