MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation

要約

我々は、Mixture-of-Attention（MoA）と呼ばれる、テキストから画像への拡散モデルのパーソナライゼーションのための新しいアーキテクチャを導入する。大規模言語モデル(LLM)で利用されているMixture-of-Expertsメカニズムに触発され、MoAは、パーソナライズされたブランチとパーソナライズされていない事前ブランチという2つのアテンション経路の間で生成負荷を分散する。MoAは、注意レイヤーを先行ブランチに固定することで、元のモデルの先行レイヤーを保持する一方、先行ブランチによって生成されたレイアウトとコンテキストに主題を埋め込むことを学習するパーソナライズドブランチを用いて、生成プロセスへの介入を最小限に抑えるように設計されている。新しいルーティングメカニズムが、これらのブランチ間の各レイヤーのピクセルの分配を管理し、パーソナライズされたコンテンツと一般的なコンテンツ作成の融合を最適化する。一旦学習されると、MoAは、元のモデルによって生成されたものと同じくらい多様な構図と相互作用を持つ複数の被写体を特徴とする、高品質でパーソナライズされた画像の作成を容易にする。重要なのは、MoAがモデルの既存の能力と、新たに増強されたパーソナライズされた介入との区別を強化することで、以前は実現できなかった、より分離された被写体とコンテキストの制御を提供することである。プロジェクトページ: https://snap-research.github.io/mixture-of-attention

要約(オリジナル)

We introduce a new architecture for personalization of text-to-image diffusion models, coined Mixture-of-Attention (MoA). Inspired by the Mixture-of-Experts mechanism utilized in large language models (LLMs), MoA distributes the generation workload between two attention pathways: a personalized branch and a non-personalized prior branch. MoA is designed to retain the original model’s prior by fixing its attention layers in the prior branch, while minimally intervening in the generation process with the personalized branch that learns to embed subjects in the layout and context generated by the prior branch. A novel routing mechanism manages the distribution of pixels in each layer across these branches to optimize the blend of personalized and generic content creation. Once trained, MoA facilitates the creation of high-quality, personalized images featuring multiple subjects with compositions and interactions as diverse as those generated by the original model. Crucially, MoA enhances the distinction between the model’s pre-existing capability and the newly augmented personalized intervention, thereby offering a more disentangled subject-context control that was previously unattainable. Project page: https://snap-research.github.io/mixture-of-attention

arxiv情報

著者	Kuan-Chieh Wang,Daniil Ostashev,Yuwei Fang,Sergey Tulyakov,Kfir Aberman
発行日	2024-05-06 16:29:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー