Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers

要約

3D 人間の姿勢推定は、奥行き情報と物理構造を維持しながら、3 次元空間内の人間の関節点をキャプチャします。
これは、人間とコンピューターの対話、シーンの理解、リハビリテーショントレーニングなど、正確な姿勢情報を必要とするアプリケーションには不可欠です。
データ収集における課題のため、3D 人間の姿勢推定の主流のデータセットは主に実験室環境で収集された多視点ビデオデータで構成されており、これには画像フレームの内容以外に豊富な時空間相関情報が含まれています。
マルチビュービデオデータセットから時空間相関を捕捉できるトランスフォーマーの注目すべき自己注意メカニズムを考慮して、3D シーケンス間 (seq2seq) 人間の姿勢検出のための多段階フレームワークを提案します。
まず、空間モジュールは人間の姿勢特徴を画像内コンテンツで表現し、フレーム画像関係モジュールは多視点画像間の時間的関係と3次元空間位置関係特徴を抽出します。
第二に、自己注意メカニズムを採用して、人体以外の部分からの干渉を排除し、コンピューティングリソースを削減します。
私たちの手法は、人気のある 3D 人間姿勢検出データセットである Human3.6M で評価されています。
実験結果は、私たちのアプローチがこのデータセットで最先端のパフォーマンスを達成することを示しています。

要約(オリジナル)

3D human pose estimation captures the human joint points in three-dimensional space while keeping the depth information and physical structure. That is essential for applications that require precise pose information, such as human-computer interaction, scene understanding, and rehabilitation training. Due to the challenges in data collection, mainstream datasets of 3D human pose estimation are primarily composed of multi-view video data collected in laboratory environments, which contains rich spatial-temporal correlation information besides the image frame content. Given the remarkable self-attention mechanism of transformers, capable of capturing the spatial-temporal correlation from multi-view video datasets, we propose a multi-stage framework for 3D sequence-to-sequence (seq2seq) human pose detection. Firstly, the spatial module represents the human pose feature by intra-image content, while the frame-image relation module extracts temporal relationships and 3D spatial positional relationship features between the multi-perspective images. Secondly, the self-attention mechanism is adopted to eliminate the interference from non-human body parts and reduce computing resources. Our method is evaluated on Human3.6M, a popular 3D human pose detection dataset. Experimental results demonstrate that our approach achieves state-of-the-art performance on this dataset.

arxiv情報

著者	Jianbin Jiao,Xina Cheng,Weijie Chen,Xiaoting Yin,Hao Shi,Kailun Yang
発行日	2024-01-30 03:00:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー