Mobius: Text to Seamless Looping Video Generation via Latent Shift

要約

Mobiusは、ユーザーの注釈なしでテキストの説明から直接シームレスにループするビデオを生成し、マルチメディアプレゼンテーション用の新しい視覚資料を作成する新しい方法を提示します。
私たちの方法は、トレーニングなしでテキストプロンプトからループビデオを生成するための事前に訓練されたビデオ潜在拡散モデルを再利用します。
推論中に、ビデオの開始ノイズと終了ノイズを接続することにより、最初に潜在サイクルを構築します。
動画拡散モデルのコンテキストによって時間的一貫性が維持できることを考えると、各ステップで最初のフレームの潜在を徐々にシフトすることにより、マルチフレーム潜在除去を実行します。
その結果、推論プロセス全体で一貫性を維持しながら、除去のコンテキストは各ステップで異なります。
さらに、私たちの方法の潜在サイクルは、任意の長さである可能性があります。
これにより、ビデオ拡散モデルのコンテキストの範囲を超えてシームレスなループビデオを生成するための潜在的なシフトアプローチが拡張されます。
以前の映画とは異なり、提案された方法では、生成された結果の動きを制限する外観としての画像を必要としません。
代わりに、私たちの方法は、より動的な動きとより良い視覚品質を生み出すことができます。
提案された方法の有効性を検証するために、複数の実験と比較を実施し、さまざまなシナリオでその有効性を示します。
すべてのコードが利用可能になります。

要約(オリジナル)

We present Mobius, a novel method to generate seamlessly looping videos from text descriptions directly without any user annotations, thereby creating new visual materials for the multi-media presentation. Our method repurposes the pre-trained video latent diffusion model for generating looping videos from text prompts without any training. During inference, we first construct a latent cycle by connecting the starting and ending noise of the videos. Given that the temporal consistency can be maintained by the context of the video diffusion model, we perform multi-frame latent denoising by gradually shifting the first-frame latent to the end in each step. As a result, the denoising context varies in each step while maintaining consistency throughout the inference process. Moreover, the latent cycle in our method can be of any length. This extends our latent-shifting approach to generate seamless looping videos beyond the scope of the video diffusion model’s context. Unlike previous cinemagraphs, the proposed method does not require an image as appearance, which will restrict the motions of the generated results. Instead, our method can produce more dynamic motion and better visual quality. We conduct multiple experiments and comparisons to verify the effectiveness of the proposed method, demonstrating its efficacy in different scenarios. All the code will be made available.

arxiv情報

著者	Xiuli Bi,Jianfei Yuan,Bo Liu,Yong Zhang,Xiaodong Cun,Chi-Man Pun,Bin Xiao
発行日	2025-02-27 17:33:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Mobius: Text to Seamless Looping Video Generation via Latent Shift

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー