Fine-grained Cross-modal Fusion based Refinement for Text-to-Image Synthesis

要約

テキストから画像への合成とは、与えられたテキスト記述から視覚的にリアルで意味的に一貫した画像を生成することを指します。
以前のアプローチでは、最初に低解像度の画像を生成し、それを高解像度に改良していました。
目覚ましい進歩にもかかわらず、これらの方法は、与えられたテキストを十分に活用するには限界があり、特にテキストの説明が複雑な場合、テキストが一致しない画像を生成する可能性があります。
FF-GAN と呼ばれる新しいきめの細かいテキスト画像融合ベースの敵対的生成ネットワークを提案します。これは、きめの細かいテキスト画像融合ブロック (FF-Block) とグローバルセマンティックリファインメント (GSR) の 2 つのモジュールで構成されます。
提案された FF-Block は、アテンションブロックといくつかの畳み込みレイヤーを統合して、きめの細かい単語コンテキスト機能を対応する視覚的機能に効果的に融合させます。この機能では、テキスト情報を完全に使用して初期画像をより詳細に調整します。
また、GSR は、洗練プロセス中に言語的特徴と視覚的特徴の間のグローバルな意味の一貫性を改善するために提案されています。
CUB-200 および COCO データセットでの広範な実験により、与えられたテキストに対してセマンティックな一貫性を持つ画像を生成する際に、他の最先端のアプローチよりも FF-GAN が優れていることが実証されています。コードは https://github.com/haoranhfut で入手できます。
/FF-GAN.

要約(オリジナル)

Text-to-image synthesis refers to generating visual-realistic and semantically consistent images from given textual descriptions. Previous approaches generate an initial low-resolution image and then refine it to be high-resolution. Despite the remarkable progress, these methods are limited in fully utilizing the given texts and could generate text-mismatched images, especially when the text description is complex. We propose a novel Fine-grained text-image Fusion based Generative Adversarial Networks, dubbed FF-GAN, which consists of two modules: Fine-grained text-image Fusion Block (FF-Block) and Global Semantic Refinement (GSR). The proposed FF-Block integrates an attention block and several convolution layers to effectively fuse the fine-grained word-context features into the corresponding visual features, in which the text information is fully used to refine the initial image with more details. And the GSR is proposed to improve the global semantic consistency between linguistic and visual features during the refinement process. Extensive experiments on CUB-200 and COCO datasets demonstrate the superiority of FF-GAN over other state-of-the-art approaches in generating images with semantic consistency to the given texts.Code is available at https://github.com/haoranhfut/FF-GAN.

arxiv情報

著者	Haoran Sun,Yang Wang,Haipeng Liu,Biao Qian
発行日	2023-02-20 09:38:50+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Fine-grained Cross-modal Fusion based Refinement for Text-to-Image Synthesis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー