Towards Counterfactual Image Manipulation via CLIP

要約

StyleGANの表現度と解きほぐされた潜在コードを活用して、既存の方法では、顔画像の年齢や性別など、さまざまな視覚属性のリアルな編集を実現できます。
興味をそそるが挑戦的な問題が発生します：生成モデルは、学習した事前情報に対して反事実的な編集を実現できますか？
自然なデータセットには反事実的サンプルがないため、さまざまな反事実的概念に対しても豊富な意味知識を提供できるContrastive-Language-Image-Pretraining（CLIP）を使用して、テキスト駆動型の方法でこの問題を調査します。
ドメイン内操作とは異なり、反事実的操作には、CLIPにカプセル化されたセマンティック知識のより包括的な活用と、ローカルの最小編集または望ましくない編集での行き詰まりを回避するための編集指示のより繊細な処理が必要です。
この目的のために、事前定義されたCLIPスペースの方向を利用して、さまざまな視点から目的の方向に編集を導く、新しい対照的な損失を設計します。
さらに、（ターゲットテキストの）CLIP埋め込みを潜在空間に明示的にマッピングし、それらを潜在コードと融合して、効果的な潜在コードの最適化と正確な編集を行う、シンプルで効果的なスキームを設計します。
広範な実験により、私たちのデザインは、さまざまな反事実的概念を持つターゲットテキストを操作しながら、正確で現実的な編集を実現することが示されています。

要約(オリジナル)

Leveraging StyleGAN’s expressivity and its disentangled latent codes, existing methods can achieve realistic editing of different visual attributes such as age and gender of facial images. An intriguing yet challenging problem arises: Can generative models achieve counterfactual editing against their learnt priors? Due to the lack of counterfactual samples in natural datasets, we investigate this problem in a text-driven manner with Contrastive-Language-Image-Pretraining (CLIP), which can offer rich semantic knowledge even for various counterfactual concepts. Different from in-domain manipulation, counterfactual manipulation requires more comprehensive exploitation of semantic knowledge encapsulated in CLIP as well as more delicate handling of editing directions for avoiding being stuck in local minimum or undesired editing. To this end, we design a novel contrastive loss that exploits predefined CLIP-space directions to guide the editing toward desired directions from different perspectives. In addition, we design a simple yet effective scheme that explicitly maps CLIP embeddings (of target text) to the latent space and fuses them with latent codes for effective latent code optimization and accurate editing. Extensive experiments show that our design achieves accurate and realistic editing while driving by target texts with various counterfactual concepts.

arxiv情報

著者	Yingchen Yu,Fangneng Zhan,Rongliang Wu,Jiahui Zhang,Shijian Lu,Miaomiao Cui,Xuansong Xie,Xian-Sheng Hua,Chunyan Miao
発行日	2022-07-07 04:57:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards Counterfactual Image Manipulation via CLIP

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー