Spatio-temporal Tendency Reasoning for Human Body Pose and Shape Estimation from Videos

要約

本論文では、動画から人体の姿勢と形状を復元するための時空間傾向推論（STR）ネットワークを紹介する。これまでのアプローチでは、3D人体データセットをいかに拡張するか、また、時間ベースの学習により精度や時間的平滑化を促進するかに焦点が当てられてきた。それらとは異なり、我々のSTRは、時間的・空間的傾向を通じて、制約のない環境で正確かつ自然な動作シーケンスを学習し、既存のビデオデータの時空間的特徴を十分に掘り起こすことを目的としている。この目的のため、我々のSTRは時空間特徴をより頑健に表現することに集中するため、特徴の表現を時間次元と空間次元のそれぞれで学習する。具体的には、効率的な時間的モデリングを行うために、まず時間的傾向推論（TTR）モジュールを提案する。TTRは、ビデオシーケンス内の時間次元の階層的な残差接続表現を構築し、時間シーケンスの傾向を効果的に推論し、人間の情報の効果的な普及を保持することができる。一方、空間表現を強化するために、空間傾向強化（STE）モジュールを設計し、さらに人間の運動情報表現における空間的に時間周波数領域に敏感な特徴を励起するように学習する。最後に、時空間特徴表現を統合し、洗練させるための統合戦略を導入する。一般に公開されている大規模なデータセットを用いた広範な実験結果から、我々のSTRは3つのデータセットで最先端技術に匹敵する性能を維持していることが明らかになった。我々のコードは https://github.com/Changboyang/STR.git で公開されている。

要約(オリジナル)

In this paper, we present a spatio-temporal tendency reasoning (STR) network for recovering human body pose and shape from videos. Previous approaches have focused on how to extend 3D human datasets and temporal-based learning to promote accuracy and temporal smoothing. Different from them, our STR aims to learn accurate and natural motion sequences in an unconstrained environment through temporal and spatial tendency and to fully excavate the spatio-temporal features of existing video data. To this end, our STR learns the representation of features in the temporal and spatial dimensions respectively, to concentrate on a more robust representation of spatio-temporal features. More specifically, for efficient temporal modeling, we first propose a temporal tendency reasoning (TTR) module. TTR constructs a time-dimensional hierarchical residual connection representation within a video sequence to effectively reason temporal sequences’ tendencies and retain effective dissemination of human information. Meanwhile, for enhancing the spatial representation, we design a spatial tendency enhancing (STE) module to further learns to excite spatially time-frequency domain sensitive features in human motion information representations. Finally, we introduce integration strategies to integrate and refine the spatio-temporal feature representations. Extensive experimental findings on large-scale publically available datasets reveal that our STR remains competitive with the state-of-the-art on three datasets. Our code are available at https://github.com/Changboyang/STR.git.

arxiv情報

著者	Boyang Zhang,SuPing Wu,Hu Cao,Kehua Ma,Pan Li,Lei Lin
発行日	2022-10-07 16:09:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Spatio-temporal Tendency Reasoning for Human Body Pose and Shape Estimation from Videos

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー