MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation

要約

テキストツービデオ（T2V）の生成は、拡散モデルで大きな進歩を遂げました。
ただし、既存の方法は、正確に拘束力のある属性に苦労し、空間的関係を決定し、複数の被験者間の複雑なアクション相互作用をキャプチャします。
これらの制限に対処するために、デュアルフェーズの洗練を通じて組成のT2V生成を強化するトレーニングなしの方法であるMagicCompを提案します。
具体的には、（1）コンディショニング段階で：セマンティックアンカーの分離を導入して、主題固有のセマンティクスを強化し、セマンティックアンカーの方向性ベクトルを元のテキスト埋め込みに徐々に注入することにより、被験者間の曖昧さを解決します。
（2）除去段階で：動的レイアウト融合の注意を提案します。これは、地位のあるプライアーとモデル適応の空間知覚を統合して、マスクされた注意調節を介して被験者を空間的領域に柔軟に結合することを提案します。
さらに、MagicCompはモデルに依存して汎用性の高いアプローチであり、既存のT2Vアーキテクチャにシームレスに統合できます。
T2V-CompbenchとVbenchでの広範な実験は、MagicCompが最先端の方法よりも優れていることを示しており、複雑なプロンプトベースや軌道制御可能なビデオ生成などのアプリケーションの可能性を強調しています。
プロジェクトページ：https：//hong-yu-zhang.github.io/magiccomp-page/。

要約(オリジナル)

Text-to-video (T2V) generation has made significant strides with diffusion models. However, existing methods still struggle with accurately binding attributes, determining spatial relationships, and capturing complex action interactions between multiple subjects. To address these limitations, we propose MagicComp, a training-free method that enhances compositional T2V generation through dual-phase refinement. Specifically, (1) During the Conditioning Stage: We introduce the Semantic Anchor Disambiguation to reinforces subject-specific semantics and resolve inter-subject ambiguity by progressively injecting the directional vectors of semantic anchors into original text embedding; (2) During the Denoising Stage: We propose Dynamic Layout Fusion Attention, which integrates grounding priors and model-adaptive spatial perception to flexibly bind subjects to their spatiotemporal regions through masked attention modulation. Furthermore, MagicComp is a model-agnostic and versatile approach, which can be seamlessly integrated into existing T2V architectures. Extensive experiments on T2V-CompBench and VBench demonstrate that MagicComp outperforms state-of-the-art methods, highlighting its potential for applications such as complex prompt-based and trajectory-controllable video generation. Project page: https://hong-yu-zhang.github.io/MagicComp-Page/.

arxiv情報

著者	Hongyu Zhang,Yufan Deng,Shenghai Yuan,Peng Jin,Zesen Cheng,Yian Zhao,Chang Liu,Jie Chen
発行日	2025-03-18 17:02:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MagicComp: Training-free Dual-Phase Refinement for Compositional Video Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー