3D Human Pose Perception from Egocentric Stereo Videos

要約

ヘッドマウントデバイスは小型化が進んでいますが、デバイスユーザーに重大なセルフオクルージョンを伴う自己中心的なビューを提供します。
したがって、既存の方法では、自己中心的なビューから複雑な 3D ポーズを正確に推定できないことがよくあります。
この研究では、自己中心的なステレオ 3D 人間の姿勢推定を改善するための新しいトランスフォーマーベースのフレームワークを提案します。これは、自己中心的なステレオビデオのシーン情報と時間的コンテキストを活用します。
具体的には、1) 自己中心的なステレオフレームの均一にサンプリングされたウィンドウを備えた 3D シーン再構成モジュールからの深度特徴、および 2) ビデオ入力の時間的特徴によって強化された人間の共同クエリを利用します。
私たちの方法は、しゃがんだり座ったりするような難しいシナリオでも人間の姿勢を正確に推定できます。
さらに、UnrealEgo2 と UnrealEgo-RW (RealWorld) という 2 つの新しいベンチマークデータセットを紹介します。
提案されたデータセットは、既存のデータセットよりもはるかに多数の自己中心的なステレオビューと人間のさまざまな動作を提供し、既存の方法と今後の方法の包括的な評価を可能にします。
私たちの広範な実験により、提案されたアプローチが以前の方法よりも大幅に優れていることがわかりました。
UnrealEgo2、UnrealEgo-RW、およびトレーニング済みモデルをプロジェクトページでリリースする予定です。

要約(オリジナル)

While head-mounted devices are becoming more compact, they provide egocentric views with significant self-occlusions of the device user. Hence, existing methods often fail to accurately estimate complex 3D poses from egocentric views. In this work, we propose a new transformer-based framework to improve egocentric stereo 3D human pose estimation, which leverages the scene information and temporal context of egocentric stereo videos. Specifically, we utilize 1) depth features from our 3D scene reconstruction module with uniformly sampled windows of egocentric stereo frames, and 2) human joint queries enhanced by temporal features of the video inputs. Our method is able to accurately estimate human poses even in challenging scenarios, such as crouching and sitting. Furthermore, we introduce two new benchmark datasets, i.e., UnrealEgo2 and UnrealEgo-RW (RealWorld). The proposed datasets offer a much larger number of egocentric stereo views with a wider variety of human motions than the existing datasets, allowing comprehensive evaluation of existing and upcoming methods. Our extensive experiments show that the proposed approach significantly outperforms previous methods. We will release UnrealEgo2, UnrealEgo-RW, and trained models on our project page.

arxiv情報

著者	Hiroyasu Akada,Jian Wang,Vladislav Golyanik,Christian Theobalt
発行日	2024-05-15 15:58:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

3D Human Pose Perception from Egocentric Stereo Videos

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー