CLIP-PAE: Projection-Augmentation Embedding to Extract Relevant Features for a Disentangled, Interpretable, and Controllable Text-Guided Face Manipulation

要約

【タイトル】
テキストにより操作可能で、解釈可能かつ制御可能な特徴を抽出するための投影拡張埋め込みCLIP-PAE

【要約】
・最近のコントラスティブ言語-画像の事前学習(CLIP)は、画像とテキストを共通の潜在空間に埋め込むことで、両者を関連付けている。
・しかし、このようなテキスト埋め込みを最適化対象にすることは、画像とテキストの埋め込みには非常に大きな違いがあるため、望ましくないアーティファクトを含む画像が生成されてしまうことがある。
・更に、操作が解釈可能かつ制御可能であることを保証することは難しい。
・この問題を解決するため、コーパスのサブスペースを定義して、特定の画像特徴をキャプチャし、操作に活用することを提案する。
・CLIP Projection-Augmentation Embedding (PAE)を最適化対象として導入し、テキストによる画像操作の性能を向上させる。
・この方法はシンプルで一般的なパラダイムであり、CLIPをベースとした画像操作アルゴリズムに簡単に適用できる。
・複数の理論的・実験的研究を行い、PAEによる画像操作の解釈可能性、制御可能性、品質、精度が最良であることを定量的・定性的に実証する。
・顔編集を例に、テキストによる意味的顔編集にも効果的であることを示す。

要約(オリジナル)

Recently introduced Contrastive Language-Image Pre-Training (CLIP) bridges images and text by embedding them into a joint latent space. This opens the door to ample literature that aims to manipulate an input image by providing a textual explanation. However, due to the discrepancy between image and text embeddings in the joint space, using text embeddings as the optimization target often introduces undesired artifacts in the resulting images. Disentanglement, interpretability, and controllability are also hard to guarantee for manipulation. To alleviate these problems, we propose to define corpus subspaces spanned by relevant prompts to capture specific image characteristics. We introduce CLIP Projection-Augmentation Embedding (PAE) as an optimization target to improve the performance of text-guided image manipulation. Our method is a simple and general paradigm that can be easily computed and adapted, and smoothly incorporated into any CLIP-based image manipulation algorithm. To demonstrate the effectiveness of our method, we conduct several theoretical and empirical studies. As a case study, we utilize the method for text-guided semantic face editing. We quantitatively and qualitatively demonstrate that PAE facilitates a more disentangled, interpretable, and controllable image manipulation with state-of-the-art quality and accuracy.

arxiv情報

著者	Chenliang Zhou,Fangcheng Zhong,Cengiz Oztireli
発行日	2023-05-07 20:26:50+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

CLIP-PAE: Projection-Augmentation Embedding to Extract Relevant Features for a Disentangled, Interpretable, and Controllable Text-Guided Face Manipulation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー