LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders

要約

オーディオビジュアルスピーチエンハンスメントは、オーディオ自体だけでなく、ターゲットスピーカーの唇の動きも活用することにより、ノイズの多い環境からクリーンなスピーチを抽出することを目的としています。
このアプローチは、特に干渉する音声の除去に関して、オーディオのみの音声強調よりも改善されることが示されています。
音声合成の最近の進歩にもかかわらず、ほとんどの視聴覚アプローチは、スペクトルマッピング/マスキングを使用してクリーンな音声を再現し続けており、多くの場合、既存の音声強調アーキテクチャに視覚的なバックボーンが追加されています。
この作業では、変圧器ベースのアーキテクチャを介してノイズの多い視聴覚音声からメルスペクトログラムを予測し、ニューラルボコーダー (HiFi-GAN) を使用してそれらを波形音声に変換する新しい 2 段階アプローチである LA-VocE を提案します。
.
何千もの話者と 11 以上の異なる言語でフレームワークをトレーニングおよび評価し、さまざまなレベルのバックグラウンドノイズと音声干渉に適応するモデルの能力を研究します。
私たちの実験は、特に非常にノイズの多いシナリオの下で、複数のメトリックに従って、LA-VocE が既存の方法よりも優れていることを示しています。

要約(オリジナル)

Audio-visual speech enhancement aims to extract clean speech from a noisy environment by leveraging not only the audio itself but also the target speaker’s lip movements. This approach has been shown to yield improvements over audio-only speech enhancement, particularly for the removal of interfering speech. Despite recent advances in speech synthesis, most audio-visual approaches continue to use spectral mapping/masking to reproduce the clean audio, often resulting in visual backbones added to existing speech enhancement architectures. In this work, we propose LA-VocE, a new two-stage approach that predicts mel-spectrograms from noisy audio-visual speech via a transformer-based architecture, and then converts them into waveform audio using a neural vocoder (HiFi-GAN). We train and evaluate our framework on thousands of speakers and 11+ different languages, and study our model’s ability to adapt to different levels of background noise and speech interference. Our experiments show that LA-VocE outperforms existing methods according to multiple metrics, particularly under very noisy scenarios.

arxiv情報

著者	Rodrigo Mira,Buye Xu,Jacob Donley,Anurag Kumar,Stavros Petridis,Vamsi Krishna Ithapu,Maja Pantic
発行日	2023-03-13 16:51:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー