Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices

要約

Text-to-image (T2I) 拡散モデルは、画像の合成と編集において最先端の結果を実現します。
ただし、このような事前トレーニング済みモデルをビデオ編集に活用することは大きな課題であると考えられています。
既存の作品の多くは、ピクセル空間または深い特徴間の明示的な対応メカニズムを通じて、編集されたビデオの時間的一貫性を強制しようとしています。
ただし、これらの方法は強い非剛体運動に対処するのが困難です。
この論文では、自然ビデオの時空間スライスが自然画像と同様の特性を示すという観察に基づいた、根本的に異なるアプローチを紹介します。
したがって、通常はビデオフレームの事前分布としてのみ使用される同じ T2I 拡散モデルは、時空間スライスに適用することにより、時間的一貫性を高めるための強力な事前分布としても機能します。
この観察に基づいて、事前学習された T2I 拡散モデルを利用して空間スライスと時空間スライスの両方を処理するテキストベースのビデオ編集方法である Slicedit を紹介します。
私たちの方法では、ターゲットのテキストを忠実に守りながら、元のビデオの構造と動きを保持したビデオを生成します。
広範な実験を通じて、私たちは現実世界のさまざまなビデオを編集できる Slicedit の能力を実証し、既存の競合する方法と比較してその明らかな利点を確認しました。
ウェブページ: https://matankleiner.github.io/slicedit/

要約(オリジナル)

Text-to-image (T2I) diffusion models achieve state-of-the-art results in image synthesis and editing. However, leveraging such pretrained models for video editing is considered a major challenge. Many existing works attempt to enforce temporal consistency in the edited video through explicit correspondence mechanisms, either in pixel space or between deep features. These methods, however, struggle with strong nonrigid motion. In this paper, we introduce a fundamentally different approach, which is based on the observation that spatiotemporal slices of natural videos exhibit similar characteristics to natural images. Thus, the same T2I diffusion model that is normally used only as a prior on video frames, can also serve as a strong prior for enhancing temporal consistency by applying it on spatiotemporal slices. Based on this observation, we present Slicedit, a method for text-based video editing that utilizes a pretrained T2I diffusion model to process both spatial and spatiotemporal slices. Our method generates videos that retain the structure and motion of the original video while adhering to the target text. Through extensive experiments, we demonstrate Slicedit’s ability to edit a wide range of real-world videos, confirming its clear advantages compared to existing competing methods. Webpage: https://matankleiner.github.io/slicedit/

arxiv情報

著者	Nathaniel Cohen,Vladimir Kulikov,Matan Kleiner,Inbar Huberman-Spiegelglas,Tomer Michaeli
発行日	2024-05-20 17:55:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー