HSTFormer: Hierarchical Spatial-Temporal Transformers for 3D Human Pose Estimation

要約

Transformer ベースのアプローチは、2D ポーズシーケンスからの 3D 人間のポーズ推定 (HPE) に対して成功裏に提案され、最先端の (SOTA) パフォーマンスを達成しました。
ただし、現在の SOTA では、さまざまなレベルで同時に関節の時空間相関をモデル化することが困難です。
これは、ポーズの時空間的な複雑さによるものです。
ポーズは、さまざまな関節や体のパーツが空間的に動くことで、一時的にさまざまな速度で動きます。
したがって、型にはまった変圧器は適応性がなく、「現場で」の要件をほとんど満たすことができません。
この問題を軽減するために、階層型時空間トランスフォーマー (HSTFormer) を提案して、マルチレベルジョイントの時空間相関をローカルからグローバルに徐々にキャプチャし、正確な 3D HPE を実現します。
HSTFormer は、4 つのトランスエンコーダー (TE) とフュージョンモジュールで構成されます。
私たちの知る限りでは、HSTFormer は、マルチレベル融合による階層的 TE を研究した最初の研究者です。
3 つのデータセット (つまり、Human3.6M、MPI-INF-3DHP、および HumanEva) での広範な実験は、HSTFormer がさまざまな規模と困難を伴うベンチマークで競争力のある一貫したパフォーマンスを達成することを示しています。
具体的には、非常に一般化された体系的なアプローチにより、困難な MPI-INF-3DHP データセットと小規模な HumanEva データセットに関する最近の SOTA を上回ります。
コードは https://github.com/qianxiaoye825/HSTFormer で入手できます。

要約(オリジナル)

Transformer-based approaches have been successfully proposed for 3D human pose estimation (HPE) from 2D pose sequence and achieved state-of-the-art (SOTA) performance. However, current SOTAs have difficulties in modeling spatial-temporal correlations of joints at different levels simultaneously. This is due to the poses’ spatial-temporal complexity. Poses move at various speeds temporarily with various joints and body-parts movement spatially. Hence, a cookie-cutter transformer is non-adaptable and can hardly meet the ‘in-the-wild’ requirement. To mitigate this issue, we propose Hierarchical Spatial-Temporal transFormers (HSTFormer) to capture multi-level joints’ spatial-temporal correlations from local to global gradually for accurate 3D HPE. HSTFormer consists of four transformer encoders (TEs) and a fusion module. To the best of our knowledge, HSTFormer is the first to study hierarchical TEs with multi-level fusion. Extensive experiments on three datasets (i.e., Human3.6M, MPI-INF-3DHP, and HumanEva) demonstrate that HSTFormer achieves competitive and consistent performance on benchmarks with various scales and difficulties. Specifically, it surpasses recent SOTAs on the challenging MPI-INF-3DHP dataset and small-scale HumanEva dataset, with a highly generalized systematic approach. The code is available at: https://github.com/qianxiaoye825/HSTFormer.

arxiv情報

著者	Xiaoye Qian,Youbao Tang,Ning Zhang,Mei Han,Jing Xiao,Ming-Chun Huang,Ruei-Sung Lin
発行日	2023-01-18 05:54:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

HSTFormer: Hierarchical Spatial-Temporal Transformers for 3D Human Pose Estimation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー