TD-GEM: Text-Driven Garment Editing Mapper

要約

言語ベースのファッション画像編集により、ユーザーは提供されたテキストプロンプトを通じて希望の衣服のバリエーションを試すことができます。
StyleCLIP と HairCLIP の潜在表現の操作に関する研究に触発され、私たちは全身人間データセットのファッションアイテムを編集するためのこれらの潜在空間に焦点を当てています。
現在、衣服の形状や質感の複雑さ、人間のポーズの多様性により、ファッション画像編集の処理にはギャップがあります。
本稿では、ファッションアイテムを解きほぐす方法で編集することを目的として、Text-Driven Garment Editing Mapper (TD-GEM) と呼ばれる編集オプティマイザースキーム手法を提案します。
この目的を達成するために、より正確な結果を得るために、最初に Encoder for Editing (e4e) や Pivotal Tuning Inversion (PTI) などの敵対的生成ネットワーク反転を通じて画像の潜在表現を取得します。
次に、最適化ベースの対照言語イメージ事前トレーニング (CLIP) を利用して、ファッションイメージの潜在表現を、テキストプロンプトで表現されるターゲット属性の方向に導きます。
当社の TD-GEM は、画像の他の部分はそのままにしながら、ターゲットの属性に従って画像を正確に操作します。
実験では、最近の操作スキームと比較してリアルな画像を効果的に生成する 2 つの異なる属性 (つまり、「色」と「袖の長さ」) に関して TD-GEM を評価します。

要約(オリジナル)

Language-based fashion image editing allows users to try out variations of desired garments through provided text prompts. Inspired by research on manipulating latent representations in StyleCLIP and HairCLIP, we focus on these latent spaces for editing fashion items of full-body human datasets. Currently, there is a gap in handling fashion image editing due to the complexity of garment shapes and textures and the diversity of human poses. In this paper, we propose an editing optimizer scheme method called Text-Driven Garment Editing Mapper (TD-GEM), aiming to edit fashion items in a disentangled way. To this end, we initially obtain a latent representation of an image through generative adversarial network inversions such as Encoder for Editing (e4e) or Pivotal Tuning Inversion (PTI) for more accurate results. An optimization-based Contrastive Language-Image Pre-training (CLIP) is then utilized to guide the latent representation of a fashion image in the direction of a target attribute expressed in terms of a text prompt. Our TD-GEM manipulates the image accurately according to the target attribute, while other parts of the image are kept untouched. In the experiments, we evaluate TD-GEM on two different attributes (i.e., ‘color’ and ‘sleeve length’), which effectively generates realistic images compared to the recent manipulation schemes.

arxiv情報

著者	Reza Dadfar,Sanaz Sabzevari,Mårten Björkman,Danica Kragic
発行日	2023-07-26 09:19:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

TD-GEM: Text-Driven Garment Editing Mapper

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー