Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics

要約

パーツレベルのダイナミクスの事前モーションとして機能するインタラクティブなビデオ生成モデルである Puppet-Master を紹介します。
テスト時に、単一の画像とモーション軌跡 (つまりドラッグ) のまばらなセットが与えられると、Puppet-Master は、指定されたドラッグインタラクションに忠実なリアルなパーツレベルのモーションを描写するビデオを合成できます。
これは、大規模な事前トレーニングされたビデオ拡散モデルを微調整することによって実現されます。このモデルに対して、ドラッグ制御を効果的に注入するための新しい調整アーキテクチャを提案します。
さらに重要なのは、広く採用されている空間アテンションモジュールのドロップイン代替となるオールツーファーストアテンションメカニズムを導入することです。これにより、既存のモデルの外観と背景の問題に対処することで生成品質が大幅に向上します。
実際のビデオでトレーニングされ、主にオブジェクト全体を動かす他のモーション調整ビデオジェネレーターとは異なり、Puppet-Master は、厳選されたパーツレベルのモーションクリップの新しいデータセットである Objaverse-Animation-HQ から学習します。
私たちは、最適化されていないアニメーションを自動的に除外し、意味のあるモーション軌跡で合成レンダリングを強化する戦略を提案します。
Puppet-Master は、さまざまなカテゴリにわたって実際の画像によく適合し、現実世界のベンチマークでゼロショット方式で既存の手法を上回るパフォーマンスを発揮します。
その他の結果については、プロジェクトページ vgg-puppetmaster.github.io を参照してください。

要約(オリジナル)

We present Puppet-Master, an interactive video generative model that can serve as a motion prior for part-level dynamics. At test time, given a single image and a sparse set of motion trajectories (i.e., drags), Puppet-Master can synthesize a video depicting realistic part-level motion faithful to the given drag interactions. This is achieved by fine-tuning a large-scale pre-trained video diffusion model, for which we propose a new conditioning architecture to inject the dragging control effectively. More importantly, we introduce the all-to-first attention mechanism, a drop-in replacement for the widely adopted spatial attention modules, which significantly improves generation quality by addressing the appearance and background issues in existing models. Unlike other motion-conditioned video generators that are trained on in-the-wild videos and mostly move an entire object, Puppet-Master is learned from Objaverse-Animation-HQ, a new dataset of curated part-level motion clips. We propose a strategy to automatically filter out sub-optimal animations and augment the synthetic renderings with meaningful motion trajectories. Puppet-Master generalizes well to real images across various categories and outperforms existing methods in a zero-shot manner on a real-world benchmark. See our project page for more results: vgg-puppetmaster.github.io.

arxiv情報

著者	Ruining Li,Chuanxia Zheng,Christian Rupprecht,Andrea Vedaldi
発行日	2024-08-08 17:59:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー