MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation

要約

私たちは、テキストから画像への拡散モデルをパーソナライズするための新しいアーキテクチャ、Mixture-of-Attention (MoA) という造語を導入します。
MoA は、大規模言語モデル (LLM) で利用されている専門家混合メカニズムにヒントを得て、2 つの注意経路 (パーソナライズされたブランチとパーソナライズされていない以前のブランチ) 間で生成ワークロードを分散します。
MoA は、以前のブランチでアテンションレイヤーを固定することで元のモデルの以前を保持するように設計されています。一方で、以前のブランチによって生成されたレイアウトとコンテキストに主題を埋め込むことを学習するパーソナライズされたブランチによる生成プロセスへの介入は最小限に抑えられます。
新しいルーティングメカニズムは、これらのブランチにわたる各レイヤーのピクセルの分布を管理し、パーソナライズされたコンテンツと一般的なコンテンツの作成のブレンドを最適化します。
MoA をトレーニングすると、元のモデルによって生成されたものと同じくらい多様な構成と相互作用を持つ複数の被写体をフィーチャーした、高品質でパーソナライズされた画像の作成が容易になります。
重要なことは、MoA はモデルの既存の機能と新たに強化された個別化された介入との区別を強化し、それによって以前は達成できなかった、より解きほぐされた対象とコンテキストの制御を提供することです。
プロジェクトページ: https://snap-research.github.io/mixture-of-attention

要約(オリジナル)

We introduce a new architecture for personalization of text-to-image diffusion models, coined Mixture-of-Attention (MoA). Inspired by the Mixture-of-Experts mechanism utilized in large language models (LLMs), MoA distributes the generation workload between two attention pathways: a personalized branch and a non-personalized prior branch. MoA is designed to retain the original model’s prior by fixing its attention layers in the prior branch, while minimally intervening in the generation process with the personalized branch that learns to embed subjects in the layout and context generated by the prior branch. A novel routing mechanism manages the distribution of pixels in each layer across these branches to optimize the blend of personalized and generic content creation. Once trained, MoA facilitates the creation of high-quality, personalized images featuring multiple subjects with compositions and interactions as diverse as those generated by the original model. Crucially, MoA enhances the distinction between the model’s pre-existing capability and the newly augmented personalized intervention, thereby offering a more disentangled subject-context control that was previously unattainable. Project page: https://snap-research.github.io/mixture-of-attention

arxiv情報

著者	Kuan-Chieh,Wang,Daniil Ostashev,Yuwei Fang,Sergey Tulyakov,Kfir Aberman
発行日	2024-04-17 17:08:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー