Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

要約

生成モデリングは、ランダムノイズを構造化された出力に変換することを目的としています。
この研究では、構造化された潜在ノイズサンプリングによるモーション制御を可能にすることで、ビデオ拡散モデルを強化します。
これは、データを変更するだけで実現されます。トレーニングビデオを前処理して、構造化されたノイズを生成します。
したがって、私たちの方法は拡散モデル設計に依存せず、モデルアーキテクチャやトレーニングパイプラインを変更する必要はありません。
具体的には、空間ガウス性を維持しながら、ランダムな時間ガウス性をオプティカルフローフィールドから導出された相関のあるワープノイズで置き換える、リアルタイムで実行できるほど高速な新しいノイズワーピングアルゴリズムを提案します。
当社のアルゴリズムの効率性により、最小限のオーバーヘッドでワープノイズを使用して最新のビデオ拡散ベースモデルを微調整することができ、ローカルオブジェクトのモーションコントロール、グローバルカメラの動きのコントロールなど、ユーザーフレンドリーな幅広いモーションコントロールのためのワンストップソリューションを提供できます。
、およびモーション転送。
ワープされたノイズにおける時間的コヒーレンスと空間的ガウス性の調和により、フレームごとのピクセル品質を維持しながら効果的なモーション制御が可能になります。
広範な実験とユーザー調査により、私たちの方法の利点が実証され、ビデオ拡散モデルの動きを制御するための堅牢でスケーラブルなアプローチとなっています。
ビデオ結果は Web ページでご覧いただけます: https://vgenai-netflix-eyeline-research.github.io/Go-with-the-Flow/;
ソースコードとモデルチェックポイントは、GitHub: https://github.com/VGenAI-Netflix-Eyeline-Research/Go-with-the-Flow で入手できます。

要約(オリジナル)

Generative modeling aims to transform random noise into structured outputs. In this work, we enhance video diffusion models by allowing motion control via structured latent noise sampling. This is achieved by just a change in data: we pre-process training videos to yield structured noise. Consequently, our method is agnostic to diffusion model design, requiring no changes to model architectures or training pipelines. Specifically, we propose a novel noise warping algorithm, fast enough to run in real time, that replaces random temporal Gaussianity with correlated warped noise derived from optical flow fields, while preserving the spatial Gaussianity. The efficiency of our algorithm enables us to fine-tune modern video diffusion base models using warped noise with minimal overhead, and provide a one-stop solution for a wide range of user-friendly motion control: local object motion control, global camera movement control, and motion transfer. The harmonization between temporal coherence and spatial Gaussianity in our warped noise leads to effective motion control while maintaining per-frame pixel quality. Extensive experiments and user studies demonstrate the advantages of our method, making it a robust and scalable approach for controlling motion in video diffusion models. Video results are available on our webpage: https://vgenai-netflix-eyeline-research.github.io/Go-with-the-Flow/; source code and model checkpoints are available on GitHub: https://github.com/VGenAI-Netflix-Eyeline-Research/Go-with-the-Flow.

arxiv情報

著者	Ryan Burgert,Yuancheng Xu,Wenqi Xian,Oliver Pilarski,Pascal Clausen,Mingming He,Li Ma,Yitong Deng,Lingxiao Li,Mohsen Mousavi,Michael Ryoo,Paul Debevec,Ning Yu
発行日	2025-01-14 18:59:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー