MoCap-to-Visual Domain Adaptation for Efficient Human Mesh Estimation from 2D Keypoints

要約

この論文では、一連の 2D 人間のポーズキーポイントを入力として受け取り、対応するボディメッシュを推定するモデルである Key2Mesh について説明します。
このプロセスには視覚データ (つまり、RGB 画像) が含まれないため、大規模なモーションキャプチャ (MoCap) データセットでモデルをトレーニングすることができ、それによって 3D ラベルを持つ画像データセットの不足を克服できます。
RGB 画像上でモデルのアプリケーションを有効にするには、まず既製の 2D 姿勢推定器を実行して 2D キーポイントを取得し、次にこれらの 2D キーポイントを Key2Mesh に供給します。
RGB イメージ上のモデルのパフォーマンスを向上させるために、敵対的ドメイン適応 (DA) 手法を適用して、MoCap ドメインとビジュアルドメインの間のギャップを埋めます。
重要なことは、私たちの DA メソッドは視覚データの 3D ラベルを必要としないため、コストのかかるラベルを必要とせずにターゲットセットに適応できるようになります。
RGB とメッシュラベルのペアが存在しない場合に、2D キーポイントから 3D 人間メッシュを推定するタスクについて Key2Mesh を評価します。
広く使用されている H3.6M および 3DPW データセットに関する私たちの結果は、Key2Mesh が両方のデータセットの PA-MPJPE で、また 3DPW データセットの MPJPE と PVE で他のモデルを上回り、新しい最先端を確立していることを示しています。
当社のモデルのシンプルなアーキテクチャのおかげで、以前の最先端モデルである LGD よりも少なくとも 12 倍高速に動作します。
追加の定性サンプルとコードは、プロジェクト Web サイト (https://key2mesh.github.io/) で入手できます。

要約(オリジナル)

This paper presents Key2Mesh, a model that takes a set of 2D human pose keypoints as input and estimates the corresponding body mesh. Since this process does not involve any visual (i.e. RGB image) data, the model can be trained on large-scale motion capture (MoCap) datasets, thereby overcoming the scarcity of image datasets with 3D labels. To enable the model’s application on RGB images, we first run an off-the-shelf 2D pose estimator to obtain the 2D keypoints, and then feed these 2D keypoints to Key2Mesh. To improve the performance of our model on RGB images, we apply an adversarial domain adaptation (DA) method to bridge the gap between the MoCap and visual domains. Crucially, our DA method does not require 3D labels for visual data, which enables adaptation to target sets without the need for costly labels. We evaluate Key2Mesh for the task of estimating 3D human meshes from 2D keypoints, in the absence of RGB and mesh label pairs. Our results on widely used H3.6M and 3DPW datasets show that Key2Mesh sets the new state-of-the-art by outperforming other models in PA-MPJPE for both datasets, and in MPJPE and PVE for the 3DPW dataset. Thanks to our model’s simple architecture, it operates at least 12x faster than the prior state-of-the-art model, LGD. Additional qualitative samples and code are available on the project website: https://key2mesh.github.io/.

arxiv情報

著者	Bedirhan Uguz,Ozhan Suat,Batuhan Karagoz,Emre Akbas
発行日	2024-04-10 15:34:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MoCap-to-Visual Domain Adaptation for Efficient Human Mesh Estimation from 2D Keypoints

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー