Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

要約

視覚的なテキストのレンダリングは、現代のテキストから画像への生成モデルにとって根本的な課題を引き起こしており、その中心的な問題はテキストエンコーダーの欠陥にあります。
正確なテキストレンダリングを実現するために、テキストエンコーダーに対する 2 つの重要な要件、つまり文字認識とグリフとの位置合わせを特定します。
私たちのソリューションには、細心の注意を払って厳選されたグリフとテキストのペアのデータセットを使用して文字認識 ByT5 エンコーダーを微調整することにより、一連のカスタマイズされたテキストエンコーダー Glyph-ByT5 を作成することが含まれます。
Glyph-ByT5 を SDXL と統合し、デザインイメージ生成用の Glyph-SDXL モデルを作成する効果的な方法を紹介します。
これにより、テキストレンダリングの精度が大幅に向上し、デザインイメージベンチマークで $20\%$ 未満から $90\%$ 近くまで向上しました。
注目に値するのは、Glyph-SDXL が新たに発見したテキスト段落レンダリング機能で、自動化された複数行レイアウトで数十から数百の文字に対して高いスペル精度を実現します。
最後に、ビジュアルテキストを特徴とする高品質でフォトリアリスティックな画像の小さなセットを使用して Glyph-SDXL を微調整することにより、オープンドメインの実際の画像におけるシーンテキストレンダリング機能の大幅な向上を示します。
これらの説得力のある成果は、多様で困難なタスク向けにカスタマイズされたテキストエンコーダを設計する際のさらなる探求を促進することを目的としています。

要約(オリジナル)

Visual text rendering poses a fundamental challenge for contemporary text-to-image generation models, with the core problem lying in text encoder deficiencies. To achieve accurate text rendering, we identify two crucial requirements for text encoders: character awareness and alignment with glyphs. Our solution involves crafting a series of customized text encoder, Glyph-ByT5, by fine-tuning the character-aware ByT5 encoder using a meticulously curated paired glyph-text dataset. We present an effective method for integrating Glyph-ByT5 with SDXL, resulting in the creation of the Glyph-SDXL model for design image generation. This significantly enhances text rendering accuracy, improving it from less than $20\%$ to nearly $90\%$ on our design image benchmark. Noteworthy is Glyph-SDXL’s newfound ability for text paragraph rendering, achieving high spelling accuracy for tens to hundreds of characters with automated multi-line layouts. Finally, through fine-tuning Glyph-SDXL with a small set of high-quality, photorealistic images featuring visual text, we showcase a substantial improvement in scene text rendering capabilities in open-domain real images. These compelling outcomes aim to encourage further exploration in designing customized text encoders for diverse and challenging tasks.

arxiv情報

著者	Zeyu Liu,Weicong Liang,Zhanhao Liang,Chong Luo,Ji Li,Gao Huang,Yuhui Yuan
発行日	2024-07-12 16:39:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー