TextDiffuser: Diffusion Models as Text Painters

要約

拡散モデルは、その優れた生成能力によりますます注目を集めていますが、現在、正確で一貫したテキストをレンダリングすることに苦労しています。
この問題に対処するために、背景と一貫性のある視覚的に魅力的なテキストを含む画像を生成することに重点を置いた TextDiffuser を導入します。
TextDiffuser は 2 つの段階で構成されます。まず、Transformer モデルがテキストプロンプトから抽出されたキーワードのレイアウトを生成し、次に拡散モデルがテキストプロンプトと生成されたレイアウトを条件とした画像を生成します。
さらに、OCR アノテーションを備えた初の大規模テキスト画像データセット MARIO-10M にも貢献しています。これには、テキスト認識、検出、文字レベルのセグメンテーションアノテーションを備えた 1,000 万個の画像とテキストのペアが含まれています。
さらに、テキストレンダリングの品質を評価するための包括的なツールとして機能する MARIO-Eval ベンチマークを収集します。
実験とユーザー調査を通じて、TextDiffuser が柔軟で制御可能であり、テキストプロンプトを単独で使用するか、テキストテンプレート画像と組み合わせて使用して高品質のテキスト画像を作成し、テキスト修復を実行して不完全な画像をテキストで再構築できることを示します。
コード、モデル、データセットは \url{https://aka.ms/textdiffuser} で入手できます。

要約(オリジナル)

Diffusion models have gained increasing attention for their impressive generation abilities but currently struggle with rendering accurate and coherent text. To address this issue, we introduce TextDiffuser, focusing on generating images with visually appealing text that is coherent with backgrounds. TextDiffuser consists of two stages: first, a Transformer model generates the layout of keywords extracted from text prompts, and then diffusion models generate images conditioned on the text prompt and the generated layout. Additionally, we contribute the first large-scale text images dataset with OCR annotations, MARIO-10M, containing 10 million image-text pairs with text recognition, detection, and character-level segmentation annotations. We further collect the MARIO-Eval benchmark to serve as a comprehensive tool for evaluating text rendering quality. Through experiments and user studies, we show that TextDiffuser is flexible and controllable to create high-quality text images using text prompts alone or together with text template images, and conduct text inpainting to reconstruct incomplete images with text. The code, model, and dataset will be available at \url{https://aka.ms/textdiffuser}.

arxiv情報

著者	Jingye Chen,Yupan Huang,Tengchao Lv,Lei Cui,Qifeng Chen,Furu Wei
発行日	2023-05-24 17:57:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

TextDiffuser: Diffusion Models as Text Painters

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー