Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models

要約

拡散モデルはさまざまな画像復元 (IR) タスクに適用されて成功していますが、そのパフォーマンスはトレーニングデータセットの選択に左右されます。
通常、特定のデータセットでトレーニングされた拡散モデルは、分布範囲外の劣化のある画像を復元できません。
この問題に対処するために、この研究では、有能な視覚言語モデルと合成劣化パイプラインを活用して、野生環境 (野生 IR) での画像復元を学習します。
より具体的には、すべての低品質画像は、ぼかし、サイズ変更、ノイズ、JPEG 圧縮などの複数の一般的な劣化を含む合成劣化パイプラインを使用してシミュレートされます。
次に、劣化を認識した CLIP モデルの堅牢なトレーニングを導入して、強化された画像コンテンツの特徴を抽出し、高品質の画像復元を支援します。
私たちの基本拡散モデルは画像復元 SDE (IR-SDE) です。
これに基づいて、ノイズのない画像を高速に生成するための事後サンプリング戦略をさらに提示します。
合成データセットと現実世界の劣化データセットの両方でモデルを評価します。
さらに、統合画像復元タスクに関する実験では、提案された事後サンプリングにより、さまざまな劣化に対する画像生成品質が向上することが示されています。

要約(オリジナル)

Though diffusion models have been successfully applied to various image restoration (IR) tasks, their performance is sensitive to the choice of training datasets. Typically, diffusion models trained in specific datasets fail to recover images that have out-of-distribution degradations. To address this problem, this work leverages a capable vision-language model and a synthetic degradation pipeline to learn image restoration in the wild (wild IR). More specifically, all low-quality images are simulated with a synthetic degradation pipeline that contains multiple common degradations such as blur, resize, noise, and JPEG compression. Then we introduce robust training for a degradation-aware CLIP model to extract enriched image content features to assist high-quality image restoration. Our base diffusion model is the image restoration SDE (IR-SDE). Built upon it, we further present a posterior sampling strategy for fast noise-free image generation. We evaluate our model on both synthetic and real-world degradation datasets. Moreover, experiments on the unified image restoration task illustrate that the proposed posterior sampling improves image generation quality for various degradations.

arxiv情報

著者	Ziwei Luo,Fredrik K. Gustafsson,Zheng Zhao,Jens Sjölund,Thomas B. Schön
発行日	2024-04-15 12:34:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー