Text-to-Image Rectified Flow as Plug-and-Play Priors

要約

大規模拡散モデルは、生成タスクにおいて顕著なパフォーマンスを達成しました。
これらのモデルは、最初のトレーニングアプリケーションを超えて、多用途のプラグアンドプレイ事前学習として機能する能力を証明しています。
たとえば、2D 拡散モデルは、3D 暗黙的モデルを最適化するための損失関数として機能します。
新しいクラスの生成モデルである整流フローは、ソース分布からターゲット分布への線形進行を強制し、さまざまなドメインにわたって優れたパフォーマンスを実証しています。
拡散ベースの方法と比較して、整流アプローチは生成品質と効率の点で優れており、必要な推論ステップが少なくなります。
この研究では、整流流ベースの手法が拡散モデルと同様の機能を提供し、効果的な事前分布としても機能することを実証する理論的および実験的証拠を提示します。
整流された流れモデルの固有の時間対称特性によって動機付けられた拡散事前分布の生成機能に加えて、私たちの方法の変形ではさらに画像反転を実行できます。
実験的には、当社の整流されたフローベースの事前分布は、テキストから 3D への生成において、拡散対応のもの (SDS および VSD 損失) よりも優れています。
私たちの方法は、画像の反転と編集においても優れたパフォーマンスを発揮します。

要約(オリジナル)

Large-scale diffusion models have achieved remarkable performance in generative tasks. Beyond their initial training applications, these models have proven their ability to function as versatile plug-and-play priors. For instance, 2D diffusion models can serve as loss functions to optimize 3D implicit models. Rectified flow, a novel class of generative models, enforces a linear progression from the source to the target distribution and has demonstrated superior performance across various domains. Compared to diffusion-based methods, rectified flow approaches surpass in terms of generation quality and efficiency, requiring fewer inference steps. In this work, we present theoretical and experimental evidence demonstrating that rectified flow based methods offer similar functionalities to diffusion models – they can also serve as effective priors. Besides the generative capabilities of diffusion priors, motivated by the unique time-symmetry properties of rectified flow models, a variant of our method can additionally perform image inversion. Experimentally, our rectified flow-based priors outperform their diffusion counterparts – the SDS and VSD losses – in text-to-3D generation. Our method also displays competitive performance in image inversion and editing.

arxiv情報

著者	Xiaofeng Yang,Cheng Chen,Xulei Yang,Fayao Liu,Guosheng Lin
発行日	2024-06-05 14:02:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Text-to-Image Rectified Flow as Plug-and-Play Priors

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー