Lafite2: Few-shot Text-to-Image Generation

要約

テキストから画像への生成モデルは近年大幅に進歩しており、現在では任意のテキストから印象的なリアルな画像を生成できます。
このようなモデルのほとんどは、Web スケールの画像とテキストのペアのデータセットでトレーニングされているため、多くの研究者にとって手頃な価格ではない可能性があります。
この論文では、画像のみのデータセットでテキストから画像への生成モデルを事前トレーニングするための新しい方法を提案します。
疑似テキスト機能を合成するために、取得してから最適化する手順を検討します。特定の画像について、関連する疑似テキスト機能が最初に取得され、次に、より適切な位置合わせのために最適化されます。
提案された方法の要件が低いため、高い柔軟性と使いやすさが得られます。少数ショット、半教師あり、完全教師ありの学習を含む幅広い設定に有益です。
敵対的生成ネットワーク (GAN) や拡散モデルなど、さまざまなモデルに適用できます。
広範な実験により、提案された方法の有効性が示されます。
MS-COCO データセットでは、GAN モデルは 6.78 の Fr\’echet Inception Distance (FID) を取得します。これは、完全に監視された設定での GAN の新しい最先端 (SoTA) です。
私たちの拡散モデルは、ゼロショットと教師付き設定でそれぞれ8.42と4.28のFIDを取得します。これは、モデルサイズがはるかに小さいSoTA拡散モデルに匹敵します。

要約(オリジナル)

Text-to-image generation models have progressed considerably in recent years, which can now generate impressive realistic images from arbitrary text. Most of such models are trained on web-scale image-text paired datasets, which may not be affordable for many researchers. In this paper, we propose a novel method for pre-training text-to-image generation model on image-only datasets. It considers a retrieval-then-optimization procedure to synthesize pseudo text features: for a given image, relevant pseudo text features are first retrieved, then optimized for better alignment. The low requirement of the proposed method yields high flexibility and usability: it can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning; it can be applied on different models including generative adversarial networks (GANs) and diffusion models. Extensive experiments illustrate the effectiveness of the proposed method. On MS-COCO dataset, our GAN model obtains Fr\’echet Inception Distance (FID) of 6.78 which is the new state-of-the-art (SoTA) of GANs under fully-supervised setting. Our diffusion model obtains FID of 8.42 and 4.28 on zero-shot and supervised setting respectively, which are competitive to SoTA diffusion models with a much smaller model size.

arxiv情報

著者	Yufan Zhou,Chunyuan Li,Changyou Chen,Jianfeng Gao,Jinhui Xu
発行日	2022-10-25 16:22:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Lafite2: Few-shot Text-to-Image Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー