Learning Source Disentanglement in Neural Audio Codec

要約

ニューラルオーディオコーデックは、連続オーディオ信号を個別のトークンに効率的に変換することにより、大幅に高度なオーディオ圧縮を実現します。
これらのコーデックは高品質のサウンドを保持し、これらのトークンでトレーニングされた生成モデルを通じて洗練されたサウンドの生成を可能にします。
ただし、既存のニューラルコーデックモデルは通常、大規模で未分化のオーディオデータセットでトレーニングされ、音声、音楽、環境音響効果などのサウンドドメイン間の本質的な不一致が無視されます。
この見落としにより、データモデリングが複雑になり、サウンド生成の制御性にさらなる課題が生じます。
これらの問題に取り組むために、オーディオコーディングとソース分離を組み合わせた新しいアプローチである Source-Disentangled Neural Audio Codec (SD-Codec) を導入します。
SD-Codec は、オーディオの再合成と分離を共同で学習することにより、さまざまなドメインからのオーディオ信号を個別のコードブック (離散表現のセット) に明示的に割り当てます。
実験結果は、SD-Codec が競争力のある再合成品質を維持するだけでなく、分離結果によって裏付けられ、潜在空間内のさまざまなソースのもつれを解くことに成功し、それによってオーディオコーデックの解釈可能性が向上し、オーディオ生成プロセスに対して潜在的により詳細な制御が可能になることを示しています。

要約(オリジナル)

Neural audio codecs have significantly advanced audio compression by efficiently converting continuous audio signals into discrete tokens. These codecs preserve high-quality sound and enable sophisticated sound generation through generative models trained on these tokens. However, existing neural codec models are typically trained on large, undifferentiated audio datasets, neglecting the essential discrepancies between sound domains like speech, music, and environmental sound effects. This oversight complicates data modeling and poses additional challenges to the controllability of sound generation. To tackle these issues, we introduce the Source-Disentangled Neural Audio Codec (SD-Codec), a novel approach that combines audio coding and source separation. By jointly learning audio resynthesis and separation, SD-Codec explicitly assigns audio signals from different domains to distinct codebooks, sets of discrete representations. Experimental results indicate that SD-Codec not only maintains competitive resynthesis quality but also, supported by the separation results, demonstrates successful disentanglement of different sources in the latent space, thereby enhancing interpretability in audio codec and providing potential finer control over the audio generation process.

arxiv情報

著者	Xiaoyu Bie,Xubo Liu,Gaël Richard
発行日	2024-09-17 14:21:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning Source Disentanglement in Neural Audio Codec

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー