VEnhancer: Generative Space-Time Enhancement for Video Generation

要約

空間領域で詳細を追加し、時間領域で合成詳細モーションを追加することで、既存のテキストからビデオへの結果を改善する生成時空強化フレームワークである VEnhancer を紹介します。
生成された低品質のビデオを考慮すると、私たちのアプローチは、統一されたビデオ拡散モデルを通じて空間および時間スケールを任意にアップサンプリングすると同時に、その空間的および時間的解像度を向上させることができます。
さらに、VEnhancer は、生成されたビデオの空間アーチファクトや時間的なちらつきを効果的に除去します。
これを達成するために、事前トレーニングされたビデオ拡散モデルに基づいてビデオ ControlNet をトレーニングし、それを低フレームレートおよび低解像度ビデオの条件として拡散モデルに注入します。
このビデオ ControlNet を効果的にトレーニングするために、時空間データの拡張とビデオを意識した調整を設計します。
上記の設計の恩恵を受けて、VEnhancer はトレーニング中に安定し、エレガントなエンドツーエンドのトレーニング方法を共有します。
広範な実験により、VEnhancer は AI 生成ビデオの強化において、既存の最先端のビデオ超解像度および時空超解像度手法を上回っていることが示されています。
さらに、VEnhancer を使用すると、既存のオープンソースの最先端のテキストからビデオへの変換手法である VideoCrafter-2 が、ビデオ生成ベンチマークである VBench のトップに到達しました。

要約(オリジナル)

We present VEnhancer, a generative space-time enhancement framework that improves the existing text-to-video results by adding more details in spatial domain and synthetic detailed motion in temporal domain. Given a generated low-quality video, our approach can increase its spatial and temporal resolution simultaneously with arbitrary up-sampling space and time scales through a unified video diffusion model. Furthermore, VEnhancer effectively removes generated spatial artifacts and temporal flickering of generated videos. To achieve this, basing on a pretrained video diffusion model, we train a video ControlNet and inject it to the diffusion model as a condition on low frame-rate and low-resolution videos. To effectively train this video ControlNet, we design space-time data augmentation as well as video-aware conditioning. Benefiting from the above designs, VEnhancer yields to be stable during training and shares an elegant end-to-end training manner. Extensive experiments show that VEnhancer surpasses existing state-of-the-art video super-resolution and space-time super-resolution methods in enhancing AI-generated videos. Moreover, with VEnhancer, exisiting open-source state-of-the-art text-to-video method, VideoCrafter-2, reaches the top one in video generation benchmark — VBench.

arxiv情報

著者	Jingwen He,Tianfan Xue,Dongyang Liu,Xinqi Lin,Peng Gao,Dahua Lin,Yu Qiao,Wanli Ouyang,Ziwei Liu
発行日	2024-07-10 13:46:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

VEnhancer: Generative Space-Time Enhancement for Video Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー