TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder

要約

テキストから画像へのモデルにおける最近の進歩により、パーソナライズされた画像生成における有望な研究の道が開かれ、ユーザーが自然言語プロンプトを使用して特定の主題の多様な画像を作成できるようになりました。
ただし、既存の方法では、参照画像が 1 つだけ与えられた場合、パフォーマンスが低下することがよくあります。
これらは入力を過剰適合させる傾向があり、テキストプロンプトに関係なく非常に類似した出力を生成します。
このペーパーでは、オーバーフィッティングを軽減することでワンショットのパーソナライゼーションの課題に取り組み、テキストプロンプトを通じて制御可能な画像の作成を可能にします。
具体的には、テキストエンコーダーに焦点を当てた選択的な微調整戦略を提案します。
さらに、パーソナライゼーションのパフォーマンスを向上させるための 3 つの主要な手法を紹介します。(1) 機能のもつれの解消を促進し、過剰適合を軽減するための拡張トークン、(2) 言語のドリフトを軽減し、多様なプロンプト全体にわたる一般化性を促進するための知識保存の損失、および (3) SNR
効率的なトレーニングのための加重サンプリング。
広範な実験により、私たちのアプローチは、メモリとストレージの要件を大幅に削減しながら、単一の参照画像のみを使用して高品質で多様な画像を効率的に生成できることが実証されました。

要約(オリジナル)

Recent breakthroughs in text-to-image models have opened up promising research avenues in personalized image generation, enabling users to create diverse images of a specific subject using natural language prompts. However, existing methods often suffer from performance degradation when given only a single reference image. They tend to overfit the input, producing highly similar outputs regardless of the text prompt. This paper addresses the challenge of one-shot personalization by mitigating overfitting, enabling the creation of controllable images through text prompts. Specifically, we propose a selective fine-tuning strategy that focuses on the text encoder. Furthermore, we introduce three key techniques to enhance personalization performance: (1) augmentation tokens to encourage feature disentanglement and alleviate overfitting, (2) a knowledge-preservation loss to reduce language drift and promote generalizability across diverse prompts, and (3) SNR-weighted sampling for efficient training. Extensive experiments demonstrate that our approach efficiently generates high-quality, diverse images using only a single reference image while significantly reducing memory and storage requirements.

arxiv情報

著者	NaHyeon Park,Kunhee Kim,Hyunjung Shim
発行日	2024-09-12 17:47:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー