DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer

要約

中立的な TTS は大きな成功を収めましたが、コンテンツの漏洩は依然として課題です。
この論文では、韻律モデリングの改善を実現するための新しい入力表現とシンプルなアーキテクチャを提案します。
TTS でのディスクリートコードの使用における最近の成功に触発され、リファレンスエンコーダの入力にディスクリートコードを導入しました。
具体的には、オーディオ圧縮モデルのベクトル量子化器を利用して、すでにトレーニングされている多様な音響情報を活用します。
さらに、修正された MLP-Mixer をリファレンスエンコーダーに適用し、アーキテクチャを軽量化します。
その結果、韻律転送 TTS をエンドツーエンドの方法でトレーニングします。
私たちは主観的評価と客観的評価の両方を通じて私たちの方法の有効性を証明します。
実験で離散コードを入力として利用した場合、リファレンスエンコーダが話者に依存しない韻律をより適切に学習することを実証します。
さらに、入力パラメータが少ない場合でも、同等の結果が得られます。

要約(オリジナル)

Despite the huge successes made in neutral TTS, content-leakage remains a challenge. In this paper, we propose a new input representation and simple architecture to achieve improved prosody modeling. Inspired by the recent success in the use of discrete code in TTS, we introduce discrete code to the input of the reference encoder. Specifically, we leverage the vector quantizer from the audio compression model to exploit the diverse acoustic information it has already been trained on. In addition, we apply the modified MLP-Mixer to the reference encoder, making the architecture lighter. As a result, we train the prosody transfer TTS in an end-to-end manner. We prove the effectiveness of our method through both subjective and objective evaluations. We demonstrate that the reference encoder learns better speaker-independent prosody when discrete code is utilized as input in the experiments. In addition, we obtain comparable results even when fewer parameters are inputted.

arxiv情報

著者	Yerin Choi,Myoung-Wan Koo
発行日	2023-06-12 06:10:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー