BudgetFusion: Perceptually-Guided Adaptive Diffusion Models

要約

拡散モデルは、テキストから画像への生成というタスクにおいて前例のない成功を収めています。
これらのモデルは高品質でリアルな画像を生成できますが、逐次ノイズ除去の複雑さにより、高い計算需要とエネルギー消費に関する社会的懸念が生じています。
これに応じて、推論効率を向上させるためにさまざまな取り組みが行われています。
ただし、既存の取り組みのほとんどは、ニューラルネットワークの簡素化やテキストプロンプトの最適化といった固定的なアプローチを採用しています。
すべてのノイズ除去計算による品質の向上は、人間にとって同等に知覚できるのでしょうか?
目的のコンテンツを考慮すると、異なるテキストプロンプトからの画像には異なる計算作業が必要になる可能性があることがわかりました。
この観察は、拡散モデルが画像の生成を開始する前に知覚的に最も効率的な拡散ステップ数を提案する新しいモデルである BudgetFusion を提示する動機となっています。
これは、拡散ステップに関連するマルチレベルの知覚メトリクスを予測することによって実現されます。
人気の安定拡散を例として、数値分析とユーザー調査の両方を実施します。
私たちの実験では、BudgetFusion は知覚的な類似性を損なうことなく、プロンプトごとに最大 5 秒を節約できることを示しています。
私たちは、この研究が核心的な疑問、つまり人間は生成モデルによって作成された画像からエネルギー 1 ワット当たりどれだけの知覚を得られるのか、という疑問に答えるための取り組みを開始できることを願っています。

要約(オリジナル)

Diffusion models have shown unprecedented success in the task of text-to-image generation. While these models are capable of generating high-quality and realistic images, the complexity of sequential denoising has raised societal concerns regarding high computational demands and energy consumption. In response, various efforts have been made to improve inference efficiency. However, most of the existing efforts have taken a fixed approach with neural network simplification or text prompt optimization. Are the quality improvements from all denoising computations equally perceivable to humans? We observed that images from different text prompts may require different computational efforts given the desired content. The observation motivates us to present BudgetFusion, a novel model that suggests the most perceptually efficient number of diffusion steps before a diffusion model starts to generate an image. This is achieved by predicting multi-level perceptual metrics relative to diffusion steps. With the popular Stable Diffusion as an example, we conduct both numerical analyses and user studies. Our experiments show that BudgetFusion saves up to five seconds per prompt without compromising perceptual similarity. We hope this work can initiate efforts toward answering a core question: how much do humans perceptually gain from images created by a generative model, per watt of energy?

arxiv情報

著者	Qinchan Li,Kenneth Chen,Changyue Su,Qi Sun
発行日	2024-12-10 15:18:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

BudgetFusion: Perceptually-Guided Adaptive Diffusion Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー