FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation

要約

入力ボーカルに伴奏する楽器音楽を生成する歌唱伴奏生成 (SAG) は、人間と AI の共生芸術創造システムの開発に不可欠です。
最先端の手法である SingSong は、SAG に多段階の自己回帰 (AR) モデルを利用していますが、この手法はセマンティックトークンと音響トークンを再帰的に生成するため非常に時間がかかり、リアルタイムでの処理が不可能になります。
アプリケーション。
この論文では、高品質で一貫性のある伴奏を作成できる Fast SAG 手法の開発を目的としています。
非 AR 拡散ベースのフレームワークが開発され、ボーカル信号から推測される条件を慎重に設計することで、ターゲット伴奏のメルスペクトログラムを直接生成します。
拡散とメルスペクトログラムモデリングを使用して、提案された方法は AR トークンベースの SingSong フレームワークを大幅に簡素化し、生成を大幅に加速します。
また、生成された伴奏がボーカル信号とセマンティックおよびリズムの一貫性を持つことを保証するために、セマンティック投影、事前投影ブロック、および一連の損失関数も設計します。
集中的な実験研究により、提案された方法が SingSong よりも優れたサンプルを生成でき、生成を少なくとも 30 倍高速化できることを実証しました。
オーディオのサンプルとコードは https://fastsag.github.io/ で入手できます。

要約(オリジナル)

Singing Accompaniment Generation (SAG), which generates instrumental music to accompany input vocals, is crucial to developing human-AI symbiotic art creation systems. The state-of-the-art method, SingSong, utilizes a multi-stage autoregressive (AR) model for SAG, however, this method is extremely slow as it generates semantic and acoustic tokens recursively, and this makes it impossible for real-time applications. In this paper, we aim to develop a Fast SAG method that can create high-quality and coherent accompaniments. A non-AR diffusion-based framework is developed, which by carefully designing the conditions inferred from the vocal signals, generates the Mel spectrogram of the target accompaniment directly. With diffusion and Mel spectrogram modeling, the proposed method significantly simplifies the AR token-based SingSong framework, and largely accelerates the generation. We also design semantic projection, prior projection blocks as well as a set of loss functions, to ensure the generated accompaniment has semantic and rhythm coherence with the vocal signal. By intensive experimental studies, we demonstrate that the proposed method can generate better samples than SingSong, and accelerate the generation by at least 30 times. Audio samples and code are available at https://fastsag.github.io/.

arxiv情報

著者	Jianyi Chen,Wei Xue,Xu Tan,Zhen Ye,Qifeng Liu,Yike Guo
発行日	2024-05-13 12:14:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー