Towards Flexible, Scalable, and Adaptive Multi-Modal Conditioned Face Synthesis

要約

マルチモーダル条件付き顔合成の最近の進歩により、視覚的に印象的で正確に位置合わせされた顔画像の作成が可能になりました。
しかし、現在の手法は依然として、スケーラビリティ、限られた柔軟性、強度を制御するための画一的なアプローチの問題に直面しており、条件付きエントロピー（モダリティ間で何らかの条件が与えられたデータの予測不能性の尺度）の異なるレベルを考慮していません。
これらの課題に対処するために、エントロピーを認識したモーダル適応変調と組み合わせたモーダルサロゲートを使用した新しいユニモーダルトレーニングアプローチを導入し、柔軟でスケーラブルなマルチモーダル条件付き顔合成ネットワークをサポートします。
ユニモーダルデータのみを活用するモーダルサロゲートを使用したユニモーダルトレーニングは、モーダルサロゲートを使用して条件をモーダル固有の特性で修飾し、モーダル間連携のリンカーとして機能し、顔合成プロセスおよび相互間での各モダリティ制御を完全に学習します。
-モーダルコラボレーション。
エントロピーを意識したモーダル適応変調は、モーダル固有の特性と特定の条件に従って拡散ノイズを微調整し、ノイズ除去軌跡に沿った十分な情報に基づいたステップを可能にし、最終的に忠実度の高い高品質の合成結果をもたらします。
私たちのフレームワークは、さまざまな条件下でのマルチモーダル顔合成を改善し、徹底的な実験結果によって実証されているように、画質と忠実度において現在の方法を上回っています。

要約(オリジナル)

Recent progress in multi-modal conditioned face synthesis has enabled the creation of visually striking and accurately aligned facial images. Yet, current methods still face issues with scalability, limited flexibility, and a one-size-fits-all approach to control strength, not accounting for the differing levels of conditional entropy, a measure of unpredictability in data given some condition, across modalities. To address these challenges, we introduce a novel uni-modal training approach with modal surrogates, coupled with an entropy-aware modal-adaptive modulation, to support flexible, scalable, and scalable multi-modal conditioned face synthesis network. Our uni-modal training with modal surrogate that only leverage uni-modal data, use modal surrogate to decorate condition with modal-specific characteristic and serve as linker for inter-modal collaboration , fully learns each modality control in face synthesis process as well as inter-modal collaboration. The entropy-aware modal-adaptive modulation finely adjust diffusion noise according to modal-specific characteristics and given conditions, enabling well-informed step along denoising trajectory and ultimately leading to synthesis results of high fidelity and quality. Our framework improves multi-modal face synthesis under various conditions, surpassing current methods in image quality and fidelity, as demonstrated by our thorough experimental results.

arxiv情報

著者	Jingjing Ren,Cheng Xu,Haoyu Chen,Xinran Qin,Lei Zhu
発行日	2024-03-21 16:58:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards Flexible, Scalable, and Adaptive Multi-Modal Conditioned Face Synthesis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー