CHIMLE: Conditional Hierarchical IMLE for Multimodal Conditional Image Synthesis

要約

条件付き画像合成における永続的な課題は、入力画像ごとに観察される出力画像が 1 つだけであるにもかかわらず、同じ入力画像から多様な出力画像を生成することでした。
GAN ベースの方法はモード崩壊を起こしやすく、多様性の低下につながります。
これを回避するために、モード崩壊を根本的に克服できる Implicit Maximum Likelihood Estimation (IMLE) を活用します。
IMLE は GAN と同じジェネレーターを使用しますが、異なる非敵対的目的でトレーニングし、観測された各画像が近くに生成されたサンプルを持つようにします。
残念なことに、忠実度の高い画像を生成するために、従来の IMLE ベースの方法では多数のサンプルが必要であり、費用がかかります。
この論文では、この制限を回避するための新しい方法を提案します。これは、多くのサンプルを必要とせずに高忠実度の画像を生成できる条件付き階層 IMLE (CHIMLE) と呼ばれます。
CHIMLE は、4 つのタスク、すなわち夜間、16 倍の単一画像の超解像度、画像のカラー化、および画像の解凍の 4 つのタスクにわたる画像の忠実度とモードカバレッジの点で、以前の最高の IMLE、GAN、および拡散ベースの方法よりも大幅に優れていることを示しています。
定量的には、私たちの方法は、Fr\’echet Inception Distance (FID) を、以前の最善の IMLE ベースの方法と比較して平均で 36.9% 改善し、最高の非 IMLE ベースの汎用方法と比較して平均で 27.5% 改善しています。

要約(オリジナル)

A persistent challenge in conditional image synthesis has been to generate diverse output images from the same input image despite only one output image being observed per input image. GAN-based methods are prone to mode collapse, which leads to low diversity. To get around this, we leverage Implicit Maximum Likelihood Estimation (IMLE) which can overcome mode collapse fundamentally. IMLE uses the same generator as GANs but trains it with a different, non-adversarial objective which ensures each observed image has a generated sample nearby. Unfortunately, to generate high-fidelity images, prior IMLE-based methods require a large number of samples, which is expensive. In this paper, we propose a new method to get around this limitation, which we dub Conditional Hierarchical IMLE (CHIMLE), which can generate high-fidelity images without requiring many samples. We show CHIMLE significantly outperforms the prior best IMLE, GAN and diffusion-based methods in terms of image fidelity and mode coverage across four tasks, namely night-to-day, 16x single image super-resolution, image colourization and image decompression. Quantitatively, our method improves Fr\’echet Inception Distance (FID) by 36.9% on average compared to the prior best IMLE-based method, and by 27.5% on average compared to the best non-IMLE-based general-purpose methods.

arxiv情報

著者	Shichong Peng,Alireza Moazeni,Ke Li
発行日	2022-11-25 18:41:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CHIMLE: Conditional Hierarchical IMLE for Multimodal Conditional Image Synthesis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー