Robust Text-driven Image Editing Method that Adaptively Explores Directions in Latent Spaces of StyleGAN and CLIP

要約

【タイトル】StyleGANおよびCLIPの潜在空間で方向を適応的に探索する頑健なテキスト駆動画像編集方法

【要約】
– 自動画像編集は多数の応用があるため、自然言語の指示を使用することが柔軟かつ直感的な編集を実現するうえで必要である。
– 文字駆動の画像編集においては、StyleGANとCLIP空間で編集方向を見つけ、その方向をStyleGAN空間にマップして画像を編集する StyleCLIP という方法があった。
– しかしながら、元の画像や文書のほかに適切な入力をチューニングすることは難しい。
– 本研究では、SVMを用いて、StyleGANとCLIP空間で適応的に編集方向を構築する方法を提案する。
– モデルは、SVMを訓練して陽性画像と陰性画像を分類することによって、CLIP空間で編集方向を法線ベクトルとして表す。
– 画像は、コンピュータビジョンの有名なデータセットであるImageNetから、画像と文書の類似度に基づいて選択される。
– StyleCLIP基準と同等の性能が確認されたが、計算時間を増やすことなくシンプルな入力が可能である。

要約(オリジナル)

Automatic image editing has great demands because of its numerous applications, and the use of natural language instructions is essential to achieving flexible and intuitive editing as the user imagines. A pioneering work in text-driven image editing, StyleCLIP, finds an edit direction in the CLIP space and then edits the image by mapping the direction to the StyleGAN space. At the same time, it is difficult to tune appropriate inputs other than the original image and text instructions for image editing. In this study, we propose a method to construct the edit direction adaptively in the StyleGAN and CLIP spaces with SVM. Our model represents the edit direction as a normal vector in the CLIP space obtained by training a SVM to classify positive and negative images. The images are retrieved from a large-scale image corpus, originally used for pre-training StyleGAN, according to the CLIP similarity between the images and the text instruction. We confirmed that our model performed as well as the StyleCLIP baseline, whereas it allows simple inputs without increasing the computational time.

arxiv情報

著者	Tsuyoshi Baba,Kosuke Nishida,Kyosuke Nishida
発行日	2023-04-03 13:30:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Robust Text-driven Image Editing Method that Adaptively Explores Directions in Latent Spaces of StyleGAN and CLIP

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー