From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation

要約

制御可能な人物画像生成の最近の進歩により、構造信号 (ポーズ、奥行きなど) や顔の外観を使用したゼロショット生成が可能になりました。
しかし、人間の外見の複数の部分を条件とした人間の画像を生成することは依然として困難です。
これに対処するために、ポーズ画像や人間の外見のさまざまな側面を含む複数の参照画像からカスタマイズされたポートレートを生成するために設計された新しいフレームワークである Parts2Whole を紹介します。
これを達成するために、私たちはまず、人間のさまざまな部分の詳細を保持するセマンティクスを意識した外観エンコーダーを開発します。このエンコーダーは、テキストラベルに基づいて各画像を処理し、1 つの画像トークンではなく一連のマルチスケールの特徴マップを作成し、画像の次元を維持します。
第 2 に、私たちのフレームワークは、拡散プロセス中に参照フィーチャとターゲットフィーチャ全体で動作する共有セルフアテンションメカニズムを通じて、マルチ画像の条件付き生成をサポートします。
参照人物画像からのマスク情報を組み込むことでバニラアテンションメカニズムを強化し、あらゆる部分を正確に選択できるようにします。
広範な実験により、既存の代替案に対する当社のアプローチの優位性が実証され、複数の部分からなる制御可能な人物画像のカスタマイズに高度な機能が提供されます。
https://huanngzh.github.io/Parts2Whole/ のプロジェクトページをご覧ください。

要約(オリジナル)

Recent advancements in controllable human image generation have led to zero-shot generation using structural signals (e.g., pose, depth) or facial appearance. Yet, generating human images conditioned on multiple parts of human appearance remains challenging. Addressing this, we introduce Parts2Whole, a novel framework designed for generating customized portraits from multiple reference images, including pose images and various aspects of human appearance. To achieve this, we first develop a semantic-aware appearance encoder to retain details of different human parts, which processes each image based on its textual label to a series of multi-scale feature maps rather than one image token, preserving the image dimension. Second, our framework supports multi-image conditioned generation through a shared self-attention mechanism that operates across reference and target features during the diffusion process. We enhance the vanilla attention mechanism by incorporating mask information from the reference human images, allowing for the precise selection of any part. Extensive experiments demonstrate the superiority of our approach over existing alternatives, offering advanced capabilities for multi-part controllable human image customization. See our project page at https://huanngzh.github.io/Parts2Whole/.

arxiv情報

著者	Zehuan Huang,Hongxing Fan,Lipeng Wang,Lu Sheng
発行日	2024-04-23 17:56:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー