SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models

要約

テキストからビデオへの変換 (T2V)、つまり、指定されたテキストプロンプトを使用してビデオを生成する開発は、近年大幅に進歩しました。
ただし、テキストプロンプトのみに依存すると、空間的な不確実性によりフレームの構成が曖昧になることがよくあります。
したがって、研究コミュニティは、制御性を高めるために、フレームごとの深度/エッジシーケンスなどの高密度構造信号を活用しており、その収集により推論の負担が増加します。
この研究では、図 1 に示すように、1 つまたはいくつかの入力のみを必要とし、時間的に疎な信号による柔軟な構造制御を可能にする SparseCtrl を紹介します。これには、事前トレーニングされた T2V モデルを残しながら、これらの疎な信号を処理するための追加の条件エンコーダーが組み込まれています。
手付かずの。
提案されたアプローチは、スケッチ、深度マップ、RGB 画像などのさまざまなモダリティと互換性があり、ビデオ生成のためのより実用的な制御を提供し、ストーリーボード、深度レンダリング、キーフレームアニメーション、補間などのアプリケーションを促進します。
広範な実験により、オリジナルとパーソナライズされた T2V ジェネレーターの両方で SparseCtrl が一般化されることが実証されました。
コードとモデルは https://guoyww.github.io/projects/SparseCtrl で公開されます。

要約(オリジナル)

The development of text-to-video (T2V), i.e., generating videos with a given text prompt, has been significantly advanced in recent years. However, relying solely on text prompts often results in ambiguous frame composition due to spatial uncertainty. The research community thus leverages the dense structure signals, e.g., per-frame depth/edge sequences, to enhance controllability, whose collection accordingly increases the burden of inference. In this work, we present SparseCtrl to enable flexible structure control with temporally sparse signals, requiring only one or a few inputs, as shown in Figure 1. It incorporates an additional condition encoder to process these sparse signals while leaving the pre-trained T2V model untouched. The proposed approach is compatible with various modalities, including sketches, depth maps, and RGB images, providing more practical control for video generation and promoting applications such as storyboarding, depth rendering, keyframe animation, and interpolation. Extensive experiments demonstrate the generalization of SparseCtrl on both original and personalized T2V generators. Codes and models will be publicly available at https://guoyww.github.io/projects/SparseCtrl .

arxiv情報

著者	Yuwei Guo,Ceyuan Yang,Anyi Rao,Maneesh Agrawala,Dahua Lin,Bo Dai
発行日	2023-11-28 16:33:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー