Att-Adapter: A Robust and Precise Domain-Specific Multi-Attributes T2I Diffusion Adapter via Conditional Variational Autoencoder

要約

テキストからイメージ（T2I）拡散モデルは、高品質の画像を生成する際に顕著なパフォーマンスを達成しました。
ただし、テキストのみのガイダンスを備えた新しいドメイン（例えば、目の開口性や車の幅などの数値）で、連続属性、特に複数の属性を同時に同時に正確に制御できるようにすることは、依然として重要な課題です。
これに対処するために、属性（ATT）アダプターを導入します。これは、前処理された拡散モデルで微調整されたマルチアトリビュート制御を可能にするように設計された新しいプラグアンドプレイモジュールです。
私たちのアプローチでは、サンプル画像のセットから単一のコントロールアダプターを学習します。サンプル画像は、対応できず、複数の視覚属性を含むことができます。
att-adapterは、分離されたクロス注意モジュールを活用して、複数のドメイン属性をテキスト条件付けで自然に調和させます。
さらに、視覚世界の多様な性質に合わせて、過剰適合を緩和するために、条件付き変分変異オートエンコーダー（CVAE）をATT-ADAPTERに導入します。
2つのパブリックデータセットの評価は、ATTアダプターが連続属性を制御する際にすべてのLORAベースのベースラインよりも優れていることを示しています。
さらに、この方法は、より広い制御範囲を有効にし、複数の属性にわたって解体を改善し、スタイルガンベースのテクニックを上回ります。
特に、att-adapterは柔軟であり、トレーニングにペアの合成データを必要とせず、単一のモデル内の複数の属性に対して簡単にスケーラブルです。

要約(オリジナル)

Text-to-Image (T2I) Diffusion Models have achieved remarkable performance in generating high quality images. However, enabling precise control of continuous attributes, especially multiple attributes simultaneously, in a new domain (e.g., numeric values like eye openness or car width) with text-only guidance remains a significant challenge. To address this, we introduce the Attribute (Att) Adapter, a novel plug-and-play module designed to enable fine-grained, multi-attributes control in pretrained diffusion models. Our approach learns a single control adapter from a set of sample images that can be unpaired and contain multiple visual attributes. The Att-Adapter leverages the decoupled cross attention module to naturally harmonize the multiple domain attributes with text conditioning. We further introduce Conditional Variational Autoencoder (CVAE) to the Att-Adapter to mitigate overfitting, matching the diverse nature of the visual world. Evaluations on two public datasets show that Att-Adapter outperforms all LoRA-based baselines in controlling continuous attributes. Additionally, our method enables a broader control range and also improves disentanglement across multiple attributes, surpassing StyleGAN-based techniques. Notably, Att-Adapter is flexible, requiring no paired synthetic data for training, and is easily scalable to multiple attributes within a single model.

arxiv情報

著者	Wonwoong Cho,Yan-Ying Chen,Matthew Klenk,David I. Inouye,Yanxia Zhang
発行日	2025-04-01 13:42:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Att-Adapter: A Robust and Precise Domain-Specific Multi-Attributes T2I Diffusion Adapter via Conditional Variational Autoencoder

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー