DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation

要約

近年、GAN反転法と対照言語画像事前学習（CLIP）を組み合わせることで、テキストプロンプトに誘導されながらゼロショットで画像を操作することが可能になった。しかし、GANの逆行列処理に限界があるため、多様な実画像への応用はまだ困難です。具体的には、学習データとは異なる新しいポーズやビュー、変化の激しいコンテンツを持つ画像の再構成が困難であったり、オブジェクトの同一性が変化したり、不要な画像アーチファクトが発生したりすることがしばしばあります。これらの問題を軽減し、実画像に忠実な操作を可能にするために、我々はDiffusionCLIPと名付けた、拡散モデルを用いてテキスト駆動型の画像操作を行う新しい手法を提案する。本手法は、近年の拡散モデルの完全な逆変換能力と高品質な画像生成能力を基に、未知の領域間でもゼロショット画像操作を成功させ、さらに、多様なImageNetデータセットの画像を操作することで一般応用への一歩を踏み出すものである。さらに、多属性操作を容易にするために、新しいノイズの組み合わせ方法を提案する。本手法は、既存のベースラインと比較して、ロバストで優れた操作性能を持つことが、広範な実験と人間による評価で確認された。コードは https://github.com/gwang-kim/DiffusionCLIP.git で公開されています。

要約(オリジナル)

Recently, GAN inversion methods combined with Contrastive Language-Image Pretraining (CLIP) enables zero-shot image manipulation guided by text prompts. However, their applications to diverse real images are still difficult due to the limited GAN inversion capability. Specifically, these approaches often have difficulties in reconstructing images with novel poses, views, and highly variable contents compared to the training data, altering object identity, or producing unwanted image artifacts. To mitigate these problems and enable faithful manipulation of real images, we propose a novel method, dubbed DiffusionCLIP, that performs text-driven image manipulation using diffusion models. Based on full inversion capability and high-quality image generation power of recent diffusion models, our method performs zero-shot image manipulation successfully even between unseen domains and takes another step towards general application by manipulating images from a widely varying ImageNet dataset. Furthermore, we propose a novel noise combination method that allows straightforward multi-attribute manipulation. Extensive experiments and human evaluation confirmed robust and superior manipulation performance of our methods compared to the existing baselines. Code is available at https://github.com/gwang-kim/DiffusionCLIP.git.

arxiv情報

著者	Gwanghyun Kim,Taesung Kwon,Jong Chul Ye
発行日	2022-08-11 13:36:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー