Generative Video Propagation

要約

大規模なビデオ生成モデルには、自然のシーンをリアルにモデル化する固有の機能があります。
この論文では、生成ビデオ伝播フレームワークを慎重に設計することで、そのようなモデルの生成能力を活用することで、さまざまなビデオタスクに統一された方法で対処できることを実証します。
具体的には、私たちのフレームワークである GenProp は、選択的コンテンツエンコーダーで元のビデオをエンコードし、画像からビデオへの生成モデルを使用して最初のフレームに加えられた変更を伝播します。
インスタンスレベルのビデオセグメンテーションデータセットに基づいて、複数のビデオタスクをカバーするデータ生成スキームを提案します。
私たちのモデルは、マスク予測デコーダーヘッドを組み込み、生成モデルが変更された領域を伝播する間、エンコーダーが元のコンテンツを保持できるように領域認識損失を最適化することによってトレーニングされます。
この斬新なデザインは新たな可能性をもたらします。編集シナリオでは、GenProp を使用してオブジェクトの形状を大幅に変更できます。
挿入の場合、挿入されたオブジェクトは独立した動きを示すことができます。
除去に関しては、GenProp は影や反射などの効果をビデオ全体から効果的に除去します。
追跡に関しては、GenProp はオブジェクトとそれに関連する効果を一緒に追跡できます。
実験結果は、さまざまなビデオタスクにおけるモデルの優れたパフォーマンスを実証し、提案されたフレームワークの詳細な分析をさらに提供します。

要約(オリジナル)

Large-scale video generation models have the inherent ability to realistically model natural scenes. In this paper, we demonstrate that through a careful design of a generative video propagation framework, various video tasks can be addressed in a unified way by leveraging the generative power of such models. Specifically, our framework, GenProp, encodes the original video with a selective content encoder and propagates the changes made to the first frame using an image-to-video generation model. We propose a data generation scheme to cover multiple video tasks based on instance-level video segmentation datasets. Our model is trained by incorporating a mask prediction decoder head and optimizing a region-aware loss to aid the encoder to preserve the original content while the generation model propagates the modified region. This novel design opens up new possibilities: In editing scenarios, GenProp allows substantial changes to an object’s shape; for insertion, the inserted objects can exhibit independent motion; for removal, GenProp effectively removes effects like shadows and reflections from the whole video; for tracking, GenProp is capable of tracking objects and their associated effects together. Experiment results demonstrate the leading performance of our model in various video tasks, and we further provide in-depth analyses of the proposed framework.

arxiv情報

著者	Shaoteng Liu,Tianyu Wang,Jui-Hsien Wang,Qing Liu,Zhifei Zhang,Joon-Young Lee,Yijun Li,Bei Yu,Zhe Lin,Soo Ye Kim,Jiaya Jia
発行日	2024-12-27 17:42:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Generative Video Propagation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー