CheapNET: Improving Light-weight speech enhancement network by projected loss function

要約

ノイズ抑制とエコーキャンセルは音声強調において重要であり、スマートデバイスとリアルタイム通信に不可欠です。
音声処理フロントエンドおよびエッジデバイスに導入されたこれらのアルゴリズムは、低い計算量で効率的なリアルタイム推論を保証する必要があります。
従来のエッジベースのノイズ抑制では、多くの場合、MSE ベースの振幅スペクトルマスクトレーニングが使用されますが、このアプローチには限界があります。
ノイズ抑制を強化するために、MSE から派生した新しい投影損失関数を導入します。
この方法では、投影技術を使用して主要なオーディオコンポーネントをノイズから分離し、モデルのパフォーマンスを大幅に向上させます。
エコーキャンセルの場合、この関数により LAEC 前処理出力の直接予測が可能になり、パフォーマンスが大幅に向上します。
当社のノイズ抑制モデルは、わずか 310 万のパラメータと 0.4GFlops/秒の計算負荷で、ほぼ最先端の結果を達成します。
さらに、当社のエコーキャンセルモデルは、複製された業界をリードするモデルよりも優れており、音声強調に新しい視点を導入します。

要約(オリジナル)

Noise suppression and echo cancellation are critical in speech enhancement and essential for smart devices and real-time communication. Deployed in voice processing front-ends and edge devices, these algorithms must ensure efficient real-time inference with low computational demands. Traditional edge-based noise suppression often uses MSE-based amplitude spectrum mask training, but this approach has limitations. We introduce a novel projection loss function, diverging from MSE, to enhance noise suppression. This method uses projection techniques to isolate key audio components from noise, significantly improving model performance. For echo cancellation, the function enables direct predictions on LAEC pre-processed outputs, substantially enhancing performance. Our noise suppression model achieves near state-of-the-art results with only 3.1M parameters and 0.4GFlops/s computational load. Moreover, our echo cancellation model outperforms replicated industry-leading models, introducing a new perspective in speech enhancement.

arxiv情報

著者	Kaijun Tan,Benzhe Dai,Jiakui Li,Wenyu Mao
発行日	2023-11-27 16:03:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CheapNET: Improving Light-weight speech enhancement network by projected loss function

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー