Motion-Conditioned Diffusion Model for Controllable Video Synthesis

要約

タイトル：制御可能な動画合成のためのMotion-Conditioned Diffusion Model

要約：

– 拡散モデルの進歩により、合成コンテンツの品質と多様性が大幅に向上している。
– 拡散モデルの表現力を活かすために、研究者たちはさまざまな制御メカニズムを探求しており、ユーザーがコンテンツ合成プロセスを直感的に誘導できるようにしている。
– 最近の取り組みは主に動画合成に焦点を当ててきたが、望ましい内容と動きを制御し説明するための効果的な方法が不足していた。
– このギャップに対応するため、私たちはMCDiffを紹介する。これは条件付け拡散モデルで、開始画像フレームと一連のストロークからビデオを生成し、ユーザーが合成する意図内容とダイナミクスを指定できるようにする。
– スパースモーション入力の曖昧さに対処し、より良い合成品質を実現するために、MCDiffはまずフロー補完モデルを利用して、動画フレームの意味理解とスパースモーション制御に基づいて密なビデオモーションを予測する。
– 次に、拡散モデルは高品質の将来のフレームを合成し、出力ビデオを形成する。
– MCDiffがストローク指導の制御可能なビデオ合成において最先端の視覚品質を達成していることを定性的・定量的に示す。
– さらに、MPII Human Poseの追加実験では、モデルの多様なコンテンツと動き合成の能力が展示されている。

要約(オリジナル)

Recent advancements in diffusion models have greatly improved the quality and diversity of synthesized content. To harness the expressive power of diffusion models, researchers have explored various controllable mechanisms that allow users to intuitively guide the content synthesis process. Although the latest efforts have primarily focused on video synthesis, there has been a lack of effective methods for controlling and describing desired content and motion. In response to this gap, we introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes, which allow users to specify the intended content and dynamics for synthesis. To tackle the ambiguity of sparse motion inputs and achieve better synthesis quality, MCDiff first utilizes a flow completion model to predict the dense video motion based on the semantic understanding of the video frame and the sparse motion control. Then, the diffusion model synthesizes high-quality future frames to form the output video. We qualitatively and quantitatively show that MCDiff achieves the state-the-of-art visual quality in stroke-guided controllable video synthesis. Additional experiments on MPII Human Pose further exhibit the capability of our model on diverse content and motion synthesis.

arxiv情報

著者	Tsai-Shien Chen,Chieh Hubert Lin,Hung-Yu Tseng,Tsung-Yi Lin,Ming-Hsuan Yang
発行日	2023-04-27 17:59:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Motion-Conditioned Diffusion Model for Controllable Video Synthesis

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー