Edge-SD-SR: Low Latency and Parameter Efficient On-device Super-Resolution with Stable Diffusion via Bidirectional Conditioning

要約

最近、安定拡散に基づく超解像（SD-SR）の視覚的品質において大きな進歩があった。しかし、携帯電話のような計算量に制限のあるデバイスに大規模な拡散モデルを展開することは、モデルサイズが大きくレイテンシーが高いため、依然として現実的ではありません。SRは高解像度（例えば4Kx3K）で動作することが多いため、これはさらに深刻である。本研究では、Edge-SD-SRを紹介する。Edge-SD-SRは、画像超解像のための最初のパラメータ効率的で低遅延な拡散モデルである。Edge-SD-SRは、UNet、エンコーダ、デコーダを含む〜169Mのパラメータで構成され、その複雑さはわずか〜142GFLOPsである。(i)双方向コンディショニングと呼ばれる、低解像度入力に対する新しいコンディショニングメカニズム。(ii)HR画像とLR画像のエンコーディングを分離し、専用のスケジュールを使用しながら、UNetとエンコーダーの共同トレーニング。(iii) UNetの出力を用いてデコーダを微調整し、推論時に得られたレイテントにデコーダを直接合わせる。Edge-SD-SRはデバイス上で効率的に動作し、例えばSamsung S24 DSP上では128×128のパッチを512×512にアップスケールするのに38ミリ秒、512×512を2048×2048にアップスケールする（25回のモデル評価を必要とする）のにわずか1.1秒である。さらに、Edge-SD-SRは、最も確立されたSRベンチマークにおいて、最先端のSRアプローチに匹敵するか、それを上回ることを示します。

要約(オリジナル)

There has been immense progress recently in the visual quality of Stable Diffusion-based Super Resolution (SD-SR). However, deploying large diffusion models on computationally restricted devices such as mobile phones remains impractical due to the large model size and high latency. This is compounded for SR as it often operates at high res (e.g. 4Kx3K). In this work, we introduce Edge-SD-SR, the first parameter efficient and low latency diffusion model for image super-resolution. Edge-SD-SR consists of ~169M parameters, including UNet, encoder and decoder, and has a complexity of only ~142 GFLOPs. To maintain a high visual quality on such low compute budget, we introduce a number of training strategies: (i) A novel conditioning mechanism on the low resolution input, coined bidirectional conditioning, which tailors the SD model for the SR task. (ii) Joint training of the UNet and encoder, while decoupling the encodings of the HR and LR images and using a dedicated schedule. (iii) Finetuning the decoder using the UNet’s output to directly tailor the decoder to the latents obtained at inference time. Edge-SD-SR runs efficiently on device, e.g. it can upscale a 128×128 patch to 512×512 in 38 msec while running on a Samsung S24 DSP, and of a 512×512 to 2048×2048 (requiring 25 model evaluations) in just ~1.1 sec. Furthermore, we show that Edge-SD-SR matches or even outperforms state-of-the-art SR approaches on the most established SR benchmarks.

arxiv情報

著者	Mehdi Noroozi,Isma Hadji,Victor Escorcia,Anestis Zaganidis,Brais Martinez,Georgios Tzimiropoulos
発行日	2025-04-04 12:48:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Edge-SD-SR: Low Latency and Parameter Efficient On-device Super-Resolution with Stable Diffusion via Bidirectional Conditioning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー