Progressive Text-to-Image Generation

要約

近年、Vector Quantized AutoRegressive (VQ-AR) モデルが、潜在空間における左上から右下の離散的な画像トークンを均等に予測することにより、テキストから画像への合成において目覚ましい成果を上げている。単純な生成プロセスは意外とうまくいくのですが、画像の生成方法はこれでいいのでしょうか？例えば、人間の創造は画像の輪郭から細部に至るまでが重視されるが、VQ-ARモデル自身は画像パッチの相対的重要性を一切考慮しない。本論文では、高忠実度のテキストから画像への生成のためのプログレッシブモデルを提案する。提案手法は、既存の文脈に基づいて粗いものから細かいものへと新しい画像トークンを並列に生成することで効果を発揮し、この手順は提案する誤り訂正機構とともに、画像列が完成するまで再帰的に適用される。この結果、粗いものから細かいものへの階層構造により、画像生成プロセスが直感的で解釈可能なものとなる。MS COCOベンチマークを用いた広範な実験により、漸進的モデルは様々なカテゴリとアスペクトにおいて、FIDスコアにおいて従来のVQ-AR法と比較して著しく優れた結果をもたらすことが実証された。さらに、各ステップにおける並列生成の設計により、わずかな性能低下で13$倍以上の推論加速が可能である。

要約(オリジナル)

Recently, Vector Quantized AutoRegressive (VQ-AR) models have shown remarkable results in text-to-image synthesis by equally predicting discrete image tokens from the top left to bottom right in the latent space. Although the simple generative process surprisingly works well, is this the best way to generate the image? For instance, human creation is more inclined to the outline-to-fine of an image, while VQ-AR models themselves do not consider any relative importance of image patches. In this paper, we present a progressive model for high-fidelity text-to-image generation. The proposed method takes effect by creating new image tokens from coarse to fine based on the existing context in a parallel manner, and this procedure is recursively applied with the proposed error revision mechanism until an image sequence is completed. The resulting coarse-to-fine hierarchy makes the image generation process intuitive and interpretable. Extensive experiments in MS COCO benchmark demonstrate that the progressive model produces significantly better results compared with the previous VQ-AR method in FID score across a wide variety of categories and aspects. Moreover, the design of parallel generation in each step allows more than $\times 13$ inference acceleration with slight performance loss.

arxiv情報

著者	Zhengcong Fei,Mingyuan Fan,Li Zhu,Junshi Huang,Xiaoming Wei,Xiaolin Wei
発行日	2023-01-03 09:18:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Progressive Text-to-Image Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー