Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation

要約

最近の研究では、複数の品質報酬を伴う強化学習 (RL) を使用すると、テキストから画像への (T2I) 生成で生成される画像の品質を向上できることが実証されました。
ただし、報酬の重みを手動で調整すると課題が生じ、特定の指標で過剰な最適化が発生する可能性があります。
これを解決するために、多目的最適化を通じて問題に対処し、パレート最適に近似するための効果的な複数報酬最適化戦略を導入する Parrot を提案します。
Parrot は、バッチ単位のパレート最適選択を利用して、さまざまな報酬間の最適なトレードオフを自動的に特定します。
新しいマルチ報酬最適化アルゴリズムを使用して、T2I モデルとプロンプト拡張ネットワークを共同最適化することで、画質が大幅に向上します。また、推論中に報酬関連のプロンプトを使用して、さまざまな報酬のトレードオフを制御できるようになります。
さらに、推論時に独自のプロンプト中心のガイダンスを導入し、プロンプト展開後のユーザー入力への忠実性を確保します。
広範な実験とユーザー調査により、美しさ、人間の好み、テキストと画像の配置、画像の感情など、さまざまな品質基準にわたるいくつかのベースラインに対する Parrot の優位性が検証されています。

要約(オリジナル)

Recent works have demonstrated that using reinforcement learning (RL) with multiple quality rewards can improve the quality of generated images in text-to-image (T2I) generation. However, manually adjusting reward weights poses challenges and may cause over-optimization in certain metrics. To solve this, we propose Parrot, which addresses the issue through multi-objective optimization and introduces an effective multi-reward optimization strategy to approximate Pareto optimal. Utilizing batch-wise Pareto optimal selection, Parrot automatically identifies the optimal trade-off among different rewards. We use the novel multi-reward optimization algorithm to jointly optimize the T2I model and a prompt expansion network, resulting in significant improvement of image quality and also allow to control the trade-off of different rewards using a reward related prompt during inference. Furthermore, we introduce original prompt-centered guidance at inference time, ensuring fidelity to user input after prompt expansion. Extensive experiments and a user study validate the superiority of Parrot over several baselines across various quality criteria, including aesthetics, human preference, text-image alignment, and image sentiment.

arxiv情報

著者	Seung Hyun Lee,Yinxiao Li,Junjie Ke,Innfarn Yoo,Han Zhang,Jiahui Yu,Qifei Wang,Fei Deng,Glenn Entis,Junfeng He,Gang Li,Sangpil Kim,Irfan Essa,Feng Yang
発行日	2024-07-15 17:19:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー