Human MotionFormer: Transferring Human Motions with Vision Transformers

要約

ヒューマンモーショントランスファーは、ターゲットの動的な人物からソースの静的な人物にモーションを合成してモーションを合成することを目的としています。
伝達されるモーションの品質を向上させるには、モーションの大きな変化と微妙な変化の両方において、ソースの人物とターゲットのモーションを正確に一致させることが不可欠です。
このホワイトペーパーでは、大規模なモーションマッチングと微妙なモーションマッチングをそれぞれキャプチャするために、グローバルな認識とローカルな認識を活用する階層型 ViT フレームワークである Human MotionFormer を提案します。
これは、入力特徴 (つまり、ターゲットモーションイメージとソース人物イメージ) を抽出する 2 つの ViT エンコーダーと、特徴マッチングとモーション転送用の複数のカスケードブロックを備えた ViT デコーダーで構成されます。
各ブロックでは、ターゲットのモーション機能をクエリとして、ソースの人物をキーと値として設定し、クロスアテンションマップを計算してグローバルな機能マッチングを実行します。
さらに、畳み込み層を導入して、グローバル相互注意計算後のローカル知覚を改善します。
このマッチングプロセスは、ワーピングブランチと生成ブランチの両方で実装され、モーション転送をガイドします。
トレーニング中に、相互学習損失を提案して、ワーピングブランチと生成ブランチ間の共同監督を有効にし、より良いモーション表現を実現します。
実験では、Human MotionFormer が質的にも量的にも新しい最先端のパフォーマンスを設定することが示されています。
プロジェクトページ：\url{https://github.com/KumapowerLIU/Human-MotionFormer}

要約(オリジナル)

Human motion transfer aims to transfer motions from a target dynamic person to a source static one for motion synthesis. An accurate matching between the source person and the target motion in both large and subtle motion changes is vital for improving the transferred motion quality. In this paper, we propose Human MotionFormer, a hierarchical ViT framework that leverages global and local perceptions to capture large and subtle motion matching, respectively. It consists of two ViT encoders to extract input features (i.e., a target motion image and a source human image) and a ViT decoder with several cascaded blocks for feature matching and motion transfer. In each block, we set the target motion feature as Query and the source person as Key and Value, calculating the cross-attention maps to conduct a global feature matching. Further, we introduce a convolutional layer to improve the local perception after the global cross-attention computations. This matching process is implemented in both warping and generation branches to guide the motion transfer. During training, we propose a mutual learning loss to enable the co-supervision between warping and generation branches for better motion representations. Experiments show that our Human MotionFormer sets the new state-of-the-art performance both qualitatively and quantitatively. Project page: \url{https://github.com/KumapowerLIU/Human-MotionFormer}

arxiv情報

著者	Hongyu Liu,Xintong Han,ChengBin Jin,Huawei Wei,Zhe Lin,Faqiang Wang,Haoye Dong,Yibing Song,Jia Xu,Qifeng Chen
発行日	2023-02-22 11:42:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Human MotionFormer: Transferring Human Motions with Vision Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー