Collective Learning Mechanism based Optimal Transport Generative Adversarial Network for Non-parallel Voice Conversion

要約

画像合成に大きな成功を示した後、生成敵対的ネットワーク（GAN）モデルは同様に、音声合成の分野で大きな進歩を遂げ、敵対的な学習プロセスを通じてターゲットデータの正確な分布を適応させる能力を活用しています。
特に、最先端の（SOTA）GANベースの音声変換（VC）モデルの領域では、実際の音声サンプルとGAN生成された音声サンプルの間の自然性にかなりの格差が存在します。
さらに、多くのGANモデルは現在、単一のジェネレーターの識別子学習アプローチで動作していますが、ターゲットデータ分布の最適化は、単一のジェネレーターマルチ分類学習スキームを通じてより効果的に達成可能です。
したがって、この研究では、深い畳み込みニューラルネットワーク（DCNN）モデル、ビジョントランス（VIT）、およびコンフォーマーを含む複数の判別器を組み込んだ、集合学習メカニズムベースの最適輸送GAN（Clot-Gan）モデルという名前の新しいGANモデルを紹介します。
さまざまな判別器を統合する目的は、集合的な学習メカニズムによって促進されるメルスペクトルグラムの形式分布を理解する能力にあります。
同時に、最適な輸送（OT）損失を含めることは、OT理論の原則を使用して、ソースとターゲットのデータ分布の間のギャップを正確に埋めることを目的としています。
VCC 2018、VCTK、およびCMU-ARCTICデータセットの実験的検証は、Clot-Gan-VCモデルが客観的および主観的評価で既存のVCモデルを上回ることを確認しています。

要約(オリジナル)

After demonstrating significant success in image synthesis, Generative Adversarial Network (GAN) models have likewise made significant progress in the field of speech synthesis, leveraging their capacity to adapt the precise distribution of target data through adversarial learning processes. Notably, in the realm of State-Of-The-Art (SOTA) GAN-based Voice Conversion (VC) models, there exists a substantial disparity in naturalness between real and GAN-generated speech samples. Furthermore, while many GAN models currently operate on a single generator discriminator learning approach, optimizing target data distribution is more effectively achievable through a single generator multi-discriminator learning scheme. Hence, this study introduces a novel GAN model named Collective Learning Mechanism-based Optimal Transport GAN (CLOT-GAN) model, incorporating multiple discriminators, including the Deep Convolutional Neural Network (DCNN) model, Vision Transformer (ViT), and conformer. The objective of integrating various discriminators lies in their ability to comprehend the formant distribution of mel-spectrograms, facilitated by a collective learning mechanism. Simultaneously, the inclusion of Optimal Transport (OT) loss aims to precisely bridge the gap between the source and target data distribution, employing the principles of OT theory. The experimental validation on VCC 2018, VCTK, and CMU-Arctic datasets confirms that the CLOT-GAN-VC model outperforms existing VC models in objective and subjective assessments.

arxiv情報

著者	Sandipan Dhar,Md. Tousin Akhter,Nanda Dulal Jana,Swagatam Das
発行日	2025-04-18 16:44:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Collective Learning Mechanism based Optimal Transport Generative Adversarial Network for Non-parallel Voice Conversion

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー