EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models

要約

近年、画像生成タスクは目覚ましい進歩を遂げており、ユーザーは高品質で視覚的に驚くべき画像を作成できるようになりました。
ただし、既存のテキストから画像への拡散モデルは、具体的な概念 (犬) の生成には優れていますが、より抽象的な概念 (感情) に関しては課題に直面します。
色やスタイルを調整して画像の感情を修正するためにいくつかの取り組みが行われてきましたが、固定された画像コンテンツで感情を効果的に伝えるには限界があります。
この研究では、感情カテゴリが与えられた場合に意味的に明確で感情に忠実な画像を生成する新しいタスクである感情画像コンテンツ生成 (EICG) を紹介します。
具体的には、感情空間を提案し、それを強力な対照言語画像事前トレーニング (CLIP) 空間と整合させるためのマッピングネットワークを構築し、抽象的な感情の具体的な解釈を提供します。
生成された画像の意味的多様性と感情の忠実性を保証するために、属性の損失と感情の信頼性がさらに提案されています。
私たちの方法は、最先端のテキストから画像へのアプローチを量的にも質的にも優れており、感情の正確さ、意味の明瞭さ、意味の多様性という 3 つのカスタム指標を導き出します。
生成に加えて、私たちの方法は感情の理解を助け、感情的なアートのデザインにインスピレーションを与えることができます。

要約(オリジナル)

Recent years have witnessed remarkable progress in image generation task, where users can create visually astonishing images with high-quality. However, existing text-to-image diffusion models are proficient in generating concrete concepts (dogs) but encounter challenges with more abstract ones (emotions). Several efforts have been made to modify image emotions with color and style adjustments, facing limitations in effectively conveying emotions with fixed image contents. In this work, we introduce Emotional Image Content Generation (EICG), a new task to generate semantic-clear and emotion-faithful images given emotion categories. Specifically, we propose an emotion space and construct a mapping network to align it with the powerful Contrastive Language-Image Pre-training (CLIP) space, providing a concrete interpretation of abstract emotions. Attribute loss and emotion confidence are further proposed to ensure the semantic diversity and emotion fidelity of the generated images. Our method outperforms the state-of-the-art text-to-image approaches both quantitatively and qualitatively, where we derive three custom metrics, i.e., emotion accuracy, semantic clarity and semantic diversity. In addition to generation, our method can help emotion understanding and inspire emotional art design.

arxiv情報

著者	Jingyuan Yang,Jiawei Feng,Hui Huang
発行日	2024-01-09 15:23:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー