StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis

要約

アウトオブドメイン(OOD)歌声合成(SVS)のためのスタイルトランスファー(Style Transfer)は、リファレンス歌声サンプルに由来する未知のスタイル(音色、感情、発音、アーティキュレーションスキルなど)を持つ高品質の歌声を生成することに焦点を当てています。しかし、歌声は非常に表現力が豊かであるため、歌声の複雑なニュアンスをモデル化することは困難な作業です。さらに、既存のSVS手法は、トレーニング段階で目標とする歌声の属性が識別可能であることを前提としているため、OODシナリオにおいて合成された歌声の品質が低下するという問題があります。これらの課題を克服するために、我々はStyleSingerを提案します。StyleSingerは、領域外の基準歌声サンプルのゼロショットスタイル転送のための最初の歌声合成モデルです。1）歌声の多様なスタイル特性を捕捉するための残差量子化モジュールを採用した残差スタイル適応器（RSA）、2）トレーニング段階でコンテンツ表現内のスタイル属性を摂動させ、モデルの汎化を改善するための不確実性モデリング層正規化（UMLN）です。ゼロショットスタイル転送における我々の広範な評価により、StyleSinger は音質とリファレンス歌声サンプルとの類似性の両方において、ベースラインモデルを凌駕することが証明されました。歌声サンプルへのアクセスはhttps://aaronz345.github.io/StyleSingerDemo/。

要約(オリジナル)

Style transfer for out-of-domain (OOD) singing voice synthesis (SVS) focuses on generating high-quality singing voices with unseen styles (such as timbre, emotion, pronunciation, and articulation skills) derived from reference singing voice samples. However, the endeavor to model the intricate nuances of singing voice styles is an arduous task, as singing voices possess a remarkable degree of expressiveness. Moreover, existing SVS methods encounter a decline in the quality of synthesized singing voices in OOD scenarios, as they rest upon the assumption that the target vocal attributes are discernible during the training phase. To overcome these challenges, we propose StyleSinger, the first singing voice synthesis model for zero-shot style transfer of out-of-domain reference singing voice samples. StyleSinger incorporates two critical approaches for enhanced effectiveness: 1) the Residual Style Adaptor (RSA) which employs a residual quantization module to capture diverse style characteristics in singing voices, and 2) the Uncertainty Modeling Layer Normalization (UMLN) to perturb the style attributes within the content representation during the training phase and thus improve the model generalization. Our extensive evaluations in zero-shot style transfer undeniably establish that StyleSinger outperforms baseline models in both audio quality and similarity to the reference singing voice samples. Access to singing voice samples can be found at https://aaronz345.github.io/StyleSingerDemo/.

arxiv情報

著者	Yu Zhang,Rongjie Huang,Ruiqi Li,JinZheng He,Yan Xia,Feiyang Chen,Xinyu Duan,Baoxing Huai,Zhou Zhao
発行日	2025-02-04 11:26:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー