One Step Diffusion via Shortcut Models

要約

拡散モデルとフローマッチングモデルは、ノイズをデータに伝達する方法を学習することで、多様でリアルな画像を生成できるようになりました。
ただし、これらのモデルからのサンプリングには、多くのニューラルネットワークパスにわたる反復的なノイズ除去が含まれるため、生成が遅くなり、コストが高くなります。
サンプリングを高速化するためのこれまでのアプローチでは、複数のトレーニングフェーズ、複数のネットワーク、脆弱なスケジューリングなど、複雑なトレーニング体制が必要でした。
単一のネットワークとトレーニングフェーズを使用して、単一または複数のサンプリングステップで高品質のサンプルを生成する生成モデルのファミリーであるショートカットモデルを紹介します。
ショートカットモデルは、現在のノイズレベルだけでなく、必要なステップサイズにも基づいてネットワークを調整し、モデルが生成プロセスを先にスキップできるようにします。
ショートカットモデルは、幅広いサンプリングステップ予算にわたって、一貫性モデルやリフローなどの以前のアプローチよりも一貫して高品質のサンプルを生成します。
蒸留と比較して、ショートカットモデルは複雑さを単一のネットワークとトレーニングフェーズに軽減し、さらに推論時にさまざまなステップバジェットを許可します。

要約(オリジナル)

Diffusion models and flow-matching models have enabled generating diverse and realistic images by learning to transfer noise to data. However, sampling from these models involves iterative denoising over many neural network passes, making generation slow and expensive. Previous approaches for speeding up sampling require complex training regimes, such as multiple training phases, multiple networks, or fragile scheduling. We introduce shortcut models, a family of generative models that use a single network and training phase to produce high-quality samples in a single or multiple sampling steps. Shortcut models condition the network not only on the current noise level but also on the desired step size, allowing the model to skip ahead in the generation process. Across a wide range of sampling step budgets, shortcut models consistently produce higher quality samples than previous approaches, such as consistency models and reflow. Compared to distillation, shortcut models reduce complexity to a single network and training phase and additionally allow varying step budgets at inference time.

arxiv情報

著者	Kevin Frans,Danijar Hafner,Sergey Levine,Pieter Abbeel
発行日	2024-10-16 13:34:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

One Step Diffusion via Shortcut Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー