Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models

要約

画像のカスタマイズは、テキストから画像への (T2I) 拡散モデルで広範囲に研究されており、印象的な成果と応用につながっています。
テキストからビデオへの (T2V) 拡散モデルの出現により、その時間的な対応物であるモーションのカスタマイズはまだ十分に調査されていません。
ワンショットのモーションカスタマイズの課題に対処するために、単一の参照ビデオからモーションをモデル化し、それを空間的および時間的な多様性を持つ新しい被写体やシーンに適応させる Customize-A-Video を提案します。
時間的アテンションレイヤーの低ランク適応 (LoRA) を利用して、リファレンスビデオからの特定のモーションモデリング用に事前トレーニングされた T2V 拡散モデルを調整します。
トレーニングパイプライン中に空間情報と時間情報を解きほぐすために、モーション学習の前に単一の参照ビデオから元の外観を分離する外観アブソーバーの新しい概念を導入します。
私たちが提案した方法は、プラグアンドプレイ方式で、カスタムビデオの生成と編集、ビデオの外観のカスタマイズ、複数のモーションの組み合わせなど、さまざまな下流タスクに簡単に拡張できます。
私たちのプロジェクトページは https://anonymous-314.github.io にあります。

要約(オリジナル)

Image customization has been extensively studied in text-to-image (T2I) diffusion models, leading to impressive outcomes and applications. With the emergence of text-to-video (T2V) diffusion models, its temporal counterpart, motion customization, has not yet been well investigated. To address the challenge of one-shot motion customization, we propose Customize-A-Video that models the motion from a single reference video and adapting it to new subjects and scenes with both spatial and temporal varieties. It leverages low-rank adaptation (LoRA) on temporal attention layers to tailor the pre-trained T2V diffusion model for specific motion modeling from the reference videos. To disentangle the spatial and temporal information during the training pipeline, we introduce a novel concept of appearance absorbers that detach the original appearance from the single reference video prior to motion learning. Our proposed method can be easily extended to various downstream tasks, including custom video generation and editing, video appearance customization, and multiple motion combination, in a plug-and-play fashion. Our project page can be found at https://anonymous-314.github.io.

arxiv情報

著者	Yixuan Ren,Yang Zhou,Jimei Yang,Jing Shi,Difan Liu,Feng Liu,Mingi Kwon,Abhinav Shrivastava
発行日	2024-02-22 18:38:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー