VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

要約

テキストから画像への拡散モデル (T2I) は、リアルで美しい画像を作成する上で前例のない機能を実証しました。
逆に、テキストからビデオへの拡散モデル (T2V) は、トレーニングビデオの品質と量が不十分であるため、フレーム品質とテキストの配置の点で依然として大幅に遅れています。
このペーパーでは、T2I の優れた機能を使用して T2V のパフォーマンスを向上させる、トレーニング不要のプラグアンドプレイ手法である VideoElevator を紹介します。
従来の T2V サンプリング (つまり、時間的および空間的モデリング) とは異なり、VideoElevator は各サンプリングステップを時間的動きの調整と空間的品質の向上に明示的に分解します。
具体的には、時間的動きの調整では、カプセル化された T2V を使用して時間的一貫性を高め、その後、T2I で必要なノイズ分布を反転します。
次に、空間品質を向上させるハーネスによって T2I が拡張され、ノイズの少ない潜在性が直接予測され、より写真のようにリアルな詳細が追加されました。
さまざまな T2V と T2I を組み合わせて、広範なプロンプトで実験を行ってきました。
結果は、VideoElevator が基本的な T2I による T2V ベースラインのパフォーマンスを向上させるだけでなく、パーソナライズされた T2I によるスタイルのビデオ合成も容易にすることを示しています。
私たちのコードは https://github.com/YBYBZhang/VideoElevator で入手できます。

要約(オリジナル)

Text-to-image diffusion models (T2I) have demonstrated unprecedented capabilities in creating realistic and aesthetic images. On the contrary, text-to-video diffusion models (T2V) still lag far behind in frame quality and text alignment, owing to insufficient quality and quantity of training videos. In this paper, we introduce VideoElevator, a training-free and plug-and-play method, which elevates the performance of T2V using superior capabilities of T2I. Different from conventional T2V sampling (i.e., temporal and spatial modeling), VideoElevator explicitly decomposes each sampling step into temporal motion refining and spatial quality elevating. Specifically, temporal motion refining uses encapsulated T2V to enhance temporal consistency, followed by inverting to the noise distribution required by T2I. Then, spatial quality elevating harnesses inflated T2I to directly predict less noisy latent, adding more photo-realistic details. We have conducted experiments in extensive prompts under the combination of various T2V and T2I. The results show that VideoElevator not only improves the performance of T2V baselines with foundational T2I, but also facilitates stylistic video synthesis with personalized T2I. Our code is available at https://github.com/YBYBZhang/VideoElevator.

arxiv情報

著者	Yabo Zhang,Yuxiang Wei,Xianhui Lin,Zheng Hui,Peiran Ren,Xuansong Xie,Xiangyang Ji,Wangmeng Zuo
発行日	2024-03-08 16:44:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー