Presto! Distilling Steps and Layers for Accelerating Music Generation

要約

拡散ベースのテキスト音楽変換 (TTM) 手法は進歩していますが、効率的で高品質な生成は依然として課題です。
サンプリングステップとステップあたりのコストの両方を削減することで、スコアベースの拡散変換器の推論を高速化するアプローチである Presto! を紹介します。
ステップを削減するために、EDM ファミリーの拡散モデル用の新しいスコアベースの分布一致蒸留 (DMD) メソッドを開発しました。これは、TTM 用の初の GAN ベースの蒸留メソッドです。
ステップあたりのコストを削減するために、隠れ状態の分散をより良く保存することで学習を改善する、最近の層蒸留法に対するシンプルだが強力な改良を開発しました。
最後に、段階蒸留法と層蒸留法を組み合わせて、二面的なアプローチを実現します。
当社は段階蒸留法と層蒸留法を個別に評価し、それぞれの収量がクラス最高のパフォーマンスを示しています。
当社の複合蒸留方法は、ダイバーシティが向上した高品質の出力を生成することができ、ベースモデルを 10 ～ 18 倍高速化できます (32 秒のモノラル/ステレオ 44.1kHz で 230/435 ミリ秒の遅延、同等の SOTA より 15 倍高速)。これは最速の高品質 TTM です。
私たちの知る限りでは。
サウンドのサンプルは https://presto-music.github.io/web/ にあります。

要約(オリジナル)

Despite advances in diffusion-based text-to-music (TTM) methods, efficient, high-quality generation remains a challenge. We introduce Presto!, an approach to inference acceleration for score-based diffusion transformers via reducing both sampling steps and cost per step. To reduce steps, we develop a new score-based distribution matching distillation (DMD) method for the EDM-family of diffusion models, the first GAN-based distillation method for TTM. To reduce the cost per step, we develop a simple, but powerful improvement to a recent layer distillation method that improves learning via better preserving hidden state variance. Finally, we combine our step and layer distillation methods together for a dual-faceted approach. We evaluate our step and layer distillation methods independently and show each yield best-in-class performance. Our combined distillation method can generate high-quality outputs with improved diversity, accelerating our base model by 10-18x (230/435ms latency for 32 second mono/stereo 44.1kHz, 15x faster than comparable SOTA) — the fastest high-quality TTM to our knowledge. Sound examples can be found at https://presto-music.github.io/web/.

arxiv情報

著者	Zachary Novack,Ge Zhu,Jonah Casebeer,Julian McAuley,Taylor Berg-Kirkpatrick,Nicholas J. Bryan
発行日	2024-10-07 16:24:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Presto! Distilling Steps and Layers for Accelerating Music Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー