DDT: A Diffusion-Driven Transformer-based Framework for Human Mesh Recovery from a Video

要約

ヒューマンメッシュリカバリ (HMR) は、ゲーム、ヒューマンコンピューターインタラクション、バーチャルリアリティなど、さまざまな現実世界のアプリケーションに豊富な人体情報を提供します。
単一の画像ベースの方法と比較して、ビデオベースの方法は、時間情報を利用して、人体の動きの優先順位を組み込むことでパフォーマンスをさらに向上させることができます。
ただし、VIBE などの多対多のアプローチは、モーションの滑らかさと一時的な不整合に悩まされます。
TCMR や MPS-Net などの多対 1 のアプローチは将来のフレームに依存しますが、これは因果関係がなく、推論中の時間効率が悪くなります。
これらの課題に対処するために、ビデオベースの HMR 用の新しい拡散駆動トランスフォーマーベースのフレームワーク (DDT) が提示されます。
DDT は、入力シーケンスから特定のモーションパターンをデコードするように設計されており、モーションの滑らかさと時間的な一貫性を高めます。
多対多のアプローチとして、DDT のデコーダーはすべてのフレームのヒューマンメッシュを出力し、時間効率が重要で因果モデルが求められる現実世界のアプリケーションで DDT をより実行可能にします。
広く使用されているデータセット (Human3.6M、MPI-INF-3DHP、および 3DPW) に対して広範な実験が行われ、DDT の有効性と効率が実証されました。

要約(オリジナル)

Human mesh recovery (HMR) provides rich human body information for various real-world applications such as gaming, human-computer interaction, and virtual reality. Compared to single image-based methods, video-based methods can utilize temporal information to further improve performance by incorporating human body motion priors. However, many-to-many approaches such as VIBE suffer from motion smoothness and temporal inconsistency. While many-to-one approaches such as TCMR and MPS-Net rely on the future frames, which is non-causal and time inefficient during inference. To address these challenges, a novel Diffusion-Driven Transformer-based framework (DDT) for video-based HMR is presented. DDT is designed to decode specific motion patterns from the input sequence, enhancing motion smoothness and temporal consistency. As a many-to-many approach, the decoder of our DDT outputs the human mesh of all the frames, making DDT more viable for real-world applications where time efficiency is crucial and a causal model is desired. Extensive experiments are conducted on the widely used datasets (Human3.6M, MPI-INF-3DHP, and 3DPW), which demonstrated the effectiveness and efficiency of our DDT.

arxiv情報

著者	Ce Zheng,Guo-Jun Qi,Chen Chen
発行日	2023-03-23 16:15:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DDT: A Diffusion-Driven Transformer-based Framework for Human Mesh Recovery from a Video

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー