Dynamic Prompt Optimizing for Text-to-Image Generation

要約

テキストから画像への生成モデル、特に Imagen や Stable Diffusion などの拡散モデルに基づくモデルは、大幅な進歩を遂げました。
最近、テキストプロンプトの繊細な改良に対する関心が高まっています。
ユーザーは、テキストプロンプト内の特定の単語の重みを割り当てたり、挿入時間ステップを変更したりして、生成される画像の品質を向上させます。
ただし、微制御プロンプトが成功するかどうかは、テキストプロンプトの精度と重みと時間ステップの慎重な選択に依存しており、これには大幅な手動介入が必要です。
これに対処するために、\textbf{P}rompt \textbf{A}auto-\textbf{E}diting (PAE) メソッドを導入します。
画像生成用の元のプロンプトを改良することに加えて、オンライン強化学習戦略をさらに採用して、各単語の重みと注入時間ステップを調査し、動的な微制御プロンプトを導き出します。
トレーニング中の報酬関数により、モデルは美的スコア、意味の一貫性、およびユーザーの好みを考慮するようになります。
実験結果は、私たちの提案した方法が元のプロンプトを効果的に改善し、意味的な整合性を維持しながら、視覚的により魅力的な画像を生成することを示しています。
コードは https://github.com/Mowenyii/PAE で入手できます。

要約(オリジナル)

Text-to-image generative models, specifically those based on diffusion models like Imagen and Stable Diffusion, have made substantial advancements. Recently, there has been a surge of interest in the delicate refinement of text prompts. Users assign weights or alter the injection time steps of certain words in the text prompts to improve the quality of generated images. However, the success of fine-control prompts depends on the accuracy of the text prompts and the careful selection of weights and time steps, which requires significant manual intervention. To address this, we introduce the \textbf{P}rompt \textbf{A}uto-\textbf{E}diting (PAE) method. Besides refining the original prompts for image generation, we further employ an online reinforcement learning strategy to explore the weights and injection time steps of each word, leading to the dynamic fine-control prompts. The reward function during training encourages the model to consider aesthetic score, semantic consistency, and user preferences. Experimental results demonstrate that our proposed method effectively improves the original prompts, generating visually more appealing images while maintaining semantic alignment. Code is available at https://github.com/Mowenyii/PAE.

arxiv情報

著者	Wenyi Mo,Tianyu Zhang,Yalong Bai,Bing Su,Ji-Rong Wen,Qing Yang
発行日	2024-04-05 13:44:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Dynamic Prompt Optimizing for Text-to-Image Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー