Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition

要約

タイトル：End-to-Endノイズロバスト音声認識におけるマルチタスク学習の勾配解決法

要約：

– Speech enhancement (SE)が自動音声認識（ASR）においてノイズ除去に効果的であることが証明されており、両タスクを共同最適化するためにマルチタスク学習戦略が採用されている。
– しかし、SE目的で学習された改善された音声が常に良好なASR結果を生み出すわけではない。
– 最適化の観点から、SEとASRのタスクの勾配の間に干渉が存在する場合があり、マルチタスク学習を妨げ、最終的に劣るASRパフォーマンスにつながる可能性がある。
– 本論文では、勾配解決（GR）と呼ばれるシンプルかつ効果的なアプローチを提案し、ノイズロバスト音声認識におけるタスク勾配の干渉を角度と大きさの両面から解決する。
– 具体的には、最初にASR勾配と鋭角な動的表面上にSEタスク勾配を映像化して互いの干渉を排除し、ASR最適化を支援する。
– さらに、2つの勾配の大きさを適応的にスケーリングして、SE勾配によって支配されたASRタスクを誤誘導することを防止する。
– 実験結果は、提案手法が勾配の干渉をうまく解決し、RATSとCHiME-4のデータセットでマルチタスク学習ベースラインに対する相対ワードエラーレート（WER）の削減率がそれぞれ9.3％と11.1％であることを示している。我々のコードはGitHubで利用可能である。

要約(オリジナル)

Speech enhancement (SE) is proved effective in reducing noise from noisy speech signals for downstream automatic speech recognition (ASR), where multi-task learning strategy is employed to jointly optimize these two tasks. However, the enhanced speech learned by SE objective may not always yield good ASR results. From the optimization view, there sometimes exists interference between the gradients of SE and ASR tasks, which could hinder the multi-task learning and finally lead to sub-optimal ASR performance. In this paper, we propose a simple yet effective approach called gradient remedy (GR) to solve interference between task gradients in noise-robust speech recognition, from perspectives of both angle and magnitude. Specifically, we first project the SE task’s gradient onto a dynamic surface that is at acute angle to ASR gradient, in order to remove the conflict between them and assist in ASR optimization. Furthermore, we adaptively rescale the magnitude of two gradients to prevent the dominant ASR task from being misled by SE gradient. Experimental results show that the proposed approach well resolves the gradient interference and achieves relative word error rate (WER) reductions of 9.3% and 11.1% over multi-task learning baseline, on RATS and CHiME-4 datasets, respectively. Our code is available at GitHub.

arxiv情報

著者	Yuchen Hu,Chen Chen,Ruizhe Li,Qiushi Zhu,Eng Siong Chng
発行日	2023-05-03 05:06:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー