Reconstructing Close Human Interactions from Multiple Views

要約

この論文では、複数のキャリブレーション済みカメラで撮影された、緊密な相互作用に従事する複数の個人のポーズを再構成するという困難なタスクに取り組みます。
この困難は、人物間オクルージョンによるノイズの多い、または誤った 2D キーポイント検出、緊密なインタラクションによるキーポイントと個人の関連付けの大きな曖昧さ、および混雑したシーンでモーションデータを収集して注釈を付けるのにリソースが必要なため、トレーニングデータが不足していることから発生します。
集中的な。
これらの課題に対処するための新しいシステムを導入します。
私たちのシステムは、学習ベースの姿勢推定コンポーネントと、それに対応するトレーニングおよび推論戦略を統合しています。
姿勢推定コンポーネントは、マルチビュー 2D キーポイントヒートマップを入力として受け取り、3D 条件付きボリュームネットワークを使用して各個人の姿勢を再構築します。
ネットワークは入力として画像を必要としないため、テストシーンからの既知のカメラパラメーターと既存の大量のモーションキャプチャデータを活用して、テストシーン内の実際のデータ分布を模倣する大規模なトレーニングデータを合成できます。
広範な実験により、私たちのアプローチはポーズの精度の点で以前のアプローチを大幅に上回り、さまざまなカメラ設定や集団サイズにわたって一般化できることが実証されました。
コードはプロジェクトページ https://github.com/zju3dv/CloseMoCap で入手できます。

要約(オリジナル)

This paper addresses the challenging task of reconstructing the poses of multiple individuals engaged in close interactions, captured by multiple calibrated cameras. The difficulty arises from the noisy or false 2D keypoint detections due to inter-person occlusion, the heavy ambiguity in associating keypoints to individuals due to the close interactions, and the scarcity of training data as collecting and annotating motion data in crowded scenes is resource-intensive. We introduce a novel system to address these challenges. Our system integrates a learning-based pose estimation component and its corresponding training and inference strategies. The pose estimation component takes multi-view 2D keypoint heatmaps as input and reconstructs the pose of each individual using a 3D conditional volumetric network. As the network doesn’t need images as input, we can leverage known camera parameters from test scenes and a large quantity of existing motion capture data to synthesize massive training data that mimics the real data distribution in test scenes. Extensive experiments demonstrate that our approach significantly surpasses previous approaches in terms of pose accuracy and is generalizable across various camera setups and population sizes. The code is available on our project page: https://github.com/zju3dv/CloseMoCap.

arxiv情報

著者	Qing Shuai,Zhiyuan Yu,Zhize Zhou,Lixin Fan,Haijun Yang,Can Yang,Xiaowei Zhou
発行日	2024-01-29 14:08:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Reconstructing Close Human Interactions from Multiple Views

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー