Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss


この論文では、モーション ガイダンスを備えた時間的に一貫したビデオを生成するという課題に取り組みます。
既存の手法の多くは追加の制御モジュールや推論時の微調整に依存していますが、最近の研究では、モデル アーキテクチャを変更したり追加のトレーニングを必要とせずに、効果的なモーション ガイダンスを達成できることが示唆されています。
このアプローチにより、トレーニング不要のセットアップの利点を維持しながら、さまざまなモーション コントロール タスクにわたる時間的一貫性が向上します。


In this paper, we address the challenge of generating temporally consistent videos with motion guidance. While many existing methods depend on additional control modules or inference-time fine-tuning, recent studies suggest that effective motion guidance is achievable without altering the model architecture or requiring extra training. Such approaches offer promising compatibility with various video generation foundation models. However, existing training-free methods often struggle to maintain consistent temporal coherence across frames or to follow guided motion accurately. In this work, we propose a simple yet effective solution that combines an initial-noise-based approach with a novel motion consistency loss, the latter being our key innovation. Specifically, we capture the inter-frame feature correlation patterns of intermediate features from a video diffusion model to represent the motion pattern of the reference video. We then design a motion consistency loss to maintain similar feature correlation patterns in the generated video, using the gradient of this loss in the latent space to guide the generation process for precise motion control. This approach improves temporal consistency across various motion control tasks while preserving the benefits of a training-free setup. Extensive experiments show that our method sets a new standard for efficient, temporally coherent video generation.


著者 Xinyu Zhang,Zicheng Duan,Dong Gong,Lingqiao Liu
発行日 2025-01-13 18:53:08+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス, Google

カテゴリー: cs.CV パーマリンク