Audio Enhancement for Computer Audition — An Iterative Training Paradigm Using Sample Importance

要約

自動音声認識 (ASR) や音響シーン分類 (ASC) などのオーディオタスク用のニューラルネットワークモデルは、実際のアプリケーションではノイズ汚染の影響を受けやすくなります。
オーディオ品質を向上させるために、個別に開発できる拡張モジュールが、ターゲットオーディオアプリケーションのフロントエンドで明示的に使用されます。
このペーパーでは、オーディオ拡張 (AE) とその後のアプリケーションのモデルを共同で最適化するためのエンドツーエンドの学習ソリューションを紹介します。
ターゲットアプリケーションに向けて AE モジュールの最適化を導くため、特に困難なサンプルを克服するために、サンプルの重要性の指標としてサンプルごとのパフォーマンス測定を利用します。
実験では、トレーニングパラダイムを評価するための 4 つの代表的なアプリケーション、つまり ASR、音声コマンド認識 (SCR)、音声感情認識 (SER)、および ASC を検討します。
これらのアプリケーションは、意味論的特徴と非意味論的特徴、一時的情報とグローバル情報に関する音声タスクと非音声タスクに関連付けられており、実験結果は、私たちが提案したアプローチが、特に低信号対比でモデルのノイズ耐性を大幅に向上できることを示しています。
ノイズ比 (SNR) は、日常生活の騒がしい環境での幅広いコンピュータ試聴タスクに適しています。

要約(オリジナル)

Neural network models for audio tasks, such as automatic speech recognition (ASR) and acoustic scene classification (ASC), are susceptible to noise contamination for real-life applications. To improve audio quality, an enhancement module, which can be developed independently, is explicitly used at the front-end of the target audio applications. In this paper, we present an end-to-end learning solution to jointly optimise the models for audio enhancement (AE) and the subsequent applications. To guide the optimisation of the AE module towards a target application, and especially to overcome difficult samples, we make use of the sample-wise performance measure as an indication of sample importance. In experiments, we consider four representative applications to evaluate our training paradigm, i.e., ASR, speech command recognition (SCR), speech emotion recognition (SER), and ASC. These applications are associated with speech and non-speech tasks concerning semantic and non-semantic features, transient and global information, and the experimental results indicate that our proposed approach can considerably boost the noise robustness of the models, especially at low signal-to-noise ratios (SNRs), for a wide range of computer audition tasks in everyday-life noisy environments.

arxiv情報

著者	Manuel Milling,Shuo Liu,Andreas Triantafyllopoulos,Ilhan Aslan,Björn W. Schuller
発行日	2024-08-12 16:23:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Audio Enhancement for Computer Audition — An Iterative Training Paradigm Using Sample Importance

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー