StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Model

要約

スタイル転送タスクの進歩にも関わらず、これまでの作業のほとんどは、色やテクスチャなどの比較的単純な特徴のみを転送することに焦点を当てており、芸術全体の表現や画家特有の特徴など、より抽象的な概念が欠けていました。
ただし、これらの抽象的なセマンティクスは、画像やテキストドキュメントの巨大なデータセットを使用してトレーニングされた DALL-E や CLIP などのモデルによってキャプチャできます。
この論文では、これらのモデルの両方を活用し、自然言語を使用して抽象的なアートスタイルを記述するスタイル転送方法、StylerDALLE を提案します。
具体的には、言語ガイドによるスタイル転送タスクを、大規模な事前学習済みベクトル量子化トークナイザーの離散潜在空間における非自己回帰トークンシーケンス変換、つまり入力コンテンツ画像から出力様式化画像への変換として定式化します。
DALL-E の離散変分オートエンコーダ (dVAE)。
スタイル情報を組み込むために、スタイル化とコンテンツの保存を同時に保証する CLIP ベースの言語監視を備えた強化学習戦略を提案します。
実験結果は、異なる粒度の言語命令を使用してアートスタイルを効果的に転送できる、私たちの方法の優位性を示しています。
コードは https://github.com/zipengxuc/StylerDALLE で入手できます。

要約(オリジナル)

Despite the progress made in the style transfer task, most previous work focus on transferring only relatively simple features like color or texture, while missing more abstract concepts such as overall art expression or painter-specific traits. However, these abstract semantics can be captured by models like DALL-E or CLIP, which have been trained using huge datasets of images and textual documents. In this paper, we propose StylerDALLE, a style transfer method that exploits both of these models and uses natural language to describe abstract art styles. Specifically, we formulate the language-guided style transfer task as a non-autoregressive token sequence translation, i.e., from input content image to output stylized image, in the discrete latent space of a large-scale pretrained vector-quantized tokenizer, e.g., the discrete variational auto-encoder (dVAE) of DALL-E. To incorporate style information, we propose a Reinforcement Learning strategy with CLIP-based language supervision that ensures stylization and content preservation simultaneously. Experimental results demonstrate the superiority of our method, which can effectively transfer art styles using language instructions at different granularities. Code is available at https://github.com/zipengxuc/StylerDALLE.

arxiv情報

著者	Zipeng Xu,Enver Sangineto,Nicu Sebe
発行日	2023-10-09 15:17:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

StylerDALLE: Language-Guided Style Transfer Using a Vector-Quantized Tokenizer of a Large-Scale Generative Model

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー