AnyStory: Towards Unified Single and Multiple Subject Personalization in Text-to-Image Generation

要約

最近、大規模な生成モデルは、優れたテキストから画像への生成機能を実証しました。
ただし、特定の被写体を含む高忠実度のパーソナライズされた画像を生成することには、特に複数の被写体が関与する場合には依然として課題が存在します。
この論文では、パーソナライズされた主題生成のための統合アプローチである AnyStory を提案します。
AnyStory は、被験者の忠実度を犠牲にすることなく、単一の被験者だけでなく、複数の被験者に対しても高忠実度のパーソナライゼーションを実現します。
具体的には、AnyStory は、「エンコードしてからルーティングする」方法で主題のパーソナライゼーションの問題をモデル化します。
エンコードのステップでは、AnyStory はユニバーサルで強力な画像エンコーダ、つまり ReferenceNet を CLIP ビジョンエンコーダと組み合わせて利用し、主題の特徴の高忠実度のエンコードを実現します。
ルーティングステップでは、AnyStory は分離されたインスタンス認識サブジェクトルーターを利用して、潜在空間内の対応するサブジェクトの潜在的な位置を正確に認識して予測し、サブジェクト条件の注入をガイドします。
詳細な実験結果は、被験者の詳細の保持、テキストの説明の調整、および複数の被験者のパーソナライズにおける私たちの方法の優れたパフォーマンスを示しています。
プロジェクトページは https://aigcdesigngroup.github.io/AnyStory/ にあります。

要約(オリジナル)

Recently, large-scale generative models have demonstrated outstanding text-to-image generation capabilities. However, generating high-fidelity personalized images with specific subjects still presents challenges, especially in cases involving multiple subjects. In this paper, we propose AnyStory, a unified approach for personalized subject generation. AnyStory not only achieves high-fidelity personalization for single subjects, but also for multiple subjects, without sacrificing subject fidelity. Specifically, AnyStory models the subject personalization problem in an ‘encode-then-route’ manner. In the encoding step, AnyStory utilizes a universal and powerful image encoder, i.e., ReferenceNet, in conjunction with CLIP vision encoder to achieve high-fidelity encoding of subject features. In the routing step, AnyStory utilizes a decoupled instance-aware subject router to accurately perceive and predict the potential location of the corresponding subject in the latent space, and guide the injection of subject conditions. Detailed experimental results demonstrate the excellent performance of our method in retaining subject details, aligning text descriptions, and personalizing for multiple subjects. The project page is at https://aigcdesigngroup.github.io/AnyStory/ .

arxiv情報

著者	Junjie He,Yuxiang Tuo,Binghui Chen,Chongyang Zhong,Yifeng Geng,Liefeng Bo
発行日	2025-01-16 12:28:39+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AnyStory: Towards Unified Single and Multiple Subject Personalization in Text-to-Image Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー