Enhanced Controllability of Diffusion Models via Feature Disentanglement and Realism-Enhanced Sampling Methods

要約

拡散モデルが有望なパフォーマンスを示しているため、拡散モデルの制御可能性を改善するために多くの努力が払われています。
ただし、拡散した潜在スペースを持つように拡散モデルを訓練する方法と、サンプリングプロセス中に解き放たれた条件を自然に組み込む方法は露出していません。
この論文では、拡散モデル（FDIFF）の特徴の解体のためのトレーニングフレームワークを紹介します。
さらに、拡散モデルのリアリズムを高め、制御可能性を高めることができる2つのサンプリング方法を提案します。
簡潔に言えば、2つの潜在的な特徴、空間コンテンツマスク、および平らなスタイルの埋め込みを条件付けした拡散モデルをトレーニングします。
拡散モデルの除去プロセスの帰納的バイアスに依存して、コンテンツ機能のポーズ/レイアウト情報とスタイル機能のセマンティック/スタイル情報をエンコードします。
サンプリング方法に関しては、条件付きの独立性の仮定を破壊して条件付き入力間の何らかの依存性を可能にすることにより、まずコンポーズタブル拡散モデル（GCDM）を一般化します。
第二に、パフォーマンスをさらに向上させるために、コンテンツとスタイル機能のタイムステップ依存の重量スケジューリングを提案します。
また、画像操作と画像翻訳の既存の方法と比較して、提案された方法のより良い制御性を観察します。

要約(オリジナル)

As Diffusion Models have shown promising performance, a lot of efforts have been made to improve the controllability of Diffusion Models. However, how to train Diffusion Models to have the disentangled latent spaces and how to naturally incorporate the disentangled conditions during the sampling process have been underexplored. In this paper, we present a training framework for feature disentanglement of Diffusion Models (FDiff). We further propose two sampling methods that can boost the realism of our Diffusion Models and also enhance the controllability. Concisely, we train Diffusion Models conditioned on two latent features, a spatial content mask, and a flattened style embedding. We rely on the inductive bias of the denoising process of Diffusion Models to encode pose/layout information in the content feature and semantic/style information in the style feature. Regarding the sampling methods, we first generalize Composable Diffusion Models (GCDM) by breaking the conditional independence assumption to allow for some dependence between conditional inputs, which is shown to be effective in realistic generation in our experiments. Second, we propose timestep-dependent weight scheduling for content and style features to further improve the performance. We also observe better controllability of our proposed methods compared to existing methods in image manipulation and image translation.

arxiv情報

著者	Wonwoong Cho,Hareesh Ravi,Midhun Harikumar,Vinh Khuc,Krishna Kumar Singh,Jingwan Lu,David I. Inouye,Ajinkya Kale
発行日	2025-04-01 13:50:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Enhanced Controllability of Diffusion Models via Feature Disentanglement and Realism-Enhanced Sampling Methods

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー