PoseCrafter: One-Shot Personalized Video Synthesis Following Flexible Pose Control

要約

このペーパーでは、柔軟なポーズの制御に続いてパーソナライズされたビデオを生成するためのワンショット手法である PoseCrafter を紹介します。
Stable Diffusion と ControlNet に基づいて構築されており、対応するグラウンドトゥルースフレームを使用せずに高品質のビデオを生成する推論プロセスを慎重に設計しています。
まず、トレーニングビデオから適切な参照フレームを選択し、それを反転して、生成するすべての潜在変数を初期化します。
次に、対応するトレーニングポーズをターゲットポーズシーケンスに挿入し、トレーニングされた時間的注意モジュールを通じて忠実度を高めます。
さらに、トレーニングビデオのポーズと推論ポーズ間の不一致に起因する顔と手の劣化を軽減するために、顔と手のランドマークを含むアフィン変換行列による簡単な潜在編集を実装します。
いくつかのデータセットに対する広範な実験により、PoseCrafter が、一般的に使用される 8 つの指標の下でビデオの膨大なコレクションで事前トレーニングされたベースラインよりも優れた結果を達成することが実証されました。
さらに、PoseCrafter は、さまざまな個人のポーズや人為的な編集を追跡し、同時にオープンドメインのトレーニングビデオに人間のアイデンティティを保持できます。
私たちのプロジェクトページは https://ml-gsai.github.io/PoseCrafter-demo/ で利用できます。

要約(オリジナル)

In this paper, we introduce PoseCrafter, a one-shot method for personalized video generation following the control of flexible poses. Built upon Stable Diffusion and ControlNet, we carefully design an inference process to produce high-quality videos without the corresponding ground-truth frames. First, we select an appropriate reference frame from the training video and invert it to initialize all latent variables for generation. Then, we insert the corresponding training pose into the target pose sequences to enhance faithfulness through a trained temporal attention module. Furthermore, to alleviate the face and hand degradation resulting from discrepancies between poses of training videos and inference poses, we implement simple latent editing through an affine transformation matrix involving facial and hand landmarks. Extensive experiments on several datasets demonstrate that PoseCrafter achieves superior results to baselines pre-trained on a vast collection of videos under 8 commonly used metrics. Besides, PoseCrafter can follow poses from different individuals or artificial edits and simultaneously retain the human identity in an open-domain training video. Our project page is available at https://ml-gsai.github.io/PoseCrafter-demo/.

arxiv情報

著者	Yong Zhong,Min Zhao,Zebin You,Xiaofeng Yu,Changwang Zhang,Chongxuan Li
発行日	2024-05-24 14:46:34+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PoseCrafter: One-Shot Personalized Video Synthesis Following Flexible Pose Control

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー