Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

要約

ファッションイラストは、デザイナーが自分のビジョンを伝え、デザインアイデアを概念化から実現に導くために使用され、衣服が人体とどのように相互作用するかを示します。
これに関連して、コンピュータービジョンを使用してファッションデザインプロセスを改善することができます。
主に衣服の仮想試着に焦点を当てた以前の研究とは異なり、テキスト、人体のポーズ、
そして衣服のスケッチ。
私たちは、これまでファッション領域で使用されたことのないアプローチである潜在拡散モデルに基づく新しいアーキテクチャを提案することで、この問題に取り組みます。
このタスクに適した既存のデータセットが不足していることを考慮して、半自動で収集されたマルチモーダルアノテーションを使用して、ドレスコードと VITON-HD という 2 つの既存のファッションデータセットも拡張します。
これらの新しいデータセットに関する実験結果は、現実性と与えられたマルチモーダル入力との一貫性の両方の観点から、私たちの提案の有効性を示しています。
ソースコードと収集されたマルチモーダルアノテーションは、https://github.com/aimagelab/multimodal-garment-designer で公開されています。

要約(オリジナル)

Fashion illustration is used by designers to communicate their vision and to bring the design idea from conceptualization to realization, showing how clothes interact with the human body. In this context, computer vision can thus be used to improve the fashion design process. Differently from previous works that mainly focused on the virtual try-on of garments, we propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images by following multimodal prompts, such as text, human body poses, and garment sketches. We tackle this problem by proposing a new architecture based on latent diffusion models, an approach that has not been used before in the fashion domain. Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets, namely Dress Code and VITON-HD, with multimodal annotations collected in a semi-automatic manner. Experimental results on these new datasets demonstrate the effectiveness of our proposal, both in terms of realism and coherence with the given multimodal inputs. Source code and collected multimodal annotations are publicly available at: https://github.com/aimagelab/multimodal-garment-designer.

arxiv情報

著者	Alberto Baldrati,Davide Morelli,Giuseppe Cartella,Marcella Cornia,Marco Bertini,Rita Cucchiara
発行日	2023-08-23 12:45:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー