Benchmarking 3D Human Pose Estimation Models under Occlusions

要約

人間のポーズ推定（HPE）には、視覚データから人体のキーポイントを検出およびローカライズすることが含まれます。
3D HPEでは、画像に体の一部が見えないオクルージョンは、正確なポーズ再構成に大きな課題をもたらします。
この論文では、現実的な閉塞条件下での3D HPEモデルの堅牢性に関するベンチマークを紹介します。これは、実際のシナリオで一般的に観察される閉塞キーポイントの組み合わせを含みます。
グラウンドトゥルース2D/3D注釈とオクルージョンラベルを備えた合成データセットであるBlendMimimimImImIm3Dデータセットを使用して、畳み込み、変圧器ベース、グラフベース、および拡散ベースのアーキテクチャにまたがる9つの最先端の2D対3D HPEモデルを評価します。
すべてのモデルはもともとHuman3.6mで訓練され、一般化を評価するために再訓練することなくここでテストされました。
実際の検出器の動作に基づいて2Dキーポイントにノイズを追加することにより、閉塞をシミュレートするプロトコルを導入し、グローバルおよびジョイントごとの感度分析の両方を実施します。
私たちの発見は、すべてのモデルが閉塞下で顕著な性能劣化を示すことを明らかにしており、拡散ベースのモデルは確率的な性質にもかかわらずパフォーマンスが低下しています。
さらに、ジョイントごとの閉塞解析では、モデル間で遠位関節（手首、足など）の一貫した脆弱性を識別します。
全体として、この作業は、閉塞の取り扱いにおける現在の3D HPEモデルの重要な制限を強調し、実際の堅牢性を改善するための洞察を提供します。

要約(オリジナル)

Human Pose Estimation (HPE) involves detecting and localizing keypoints on the human body from visual data. In 3D HPE, occlusions, where parts of the body are not visible in the image, pose a significant challenge for accurate pose reconstruction. This paper presents a benchmark on the robustness of 3D HPE models under realistic occlusion conditions, involving combinations of occluded keypoints commonly observed in real-world scenarios. We evaluate nine state-of-the-art 2D-to-3D HPE models, spanning convolutional, transformer-based, graph-based, and diffusion-based architectures, using the BlendMimic3D dataset, a synthetic dataset with ground-truth 2D/3D annotations and occlusion labels. All models were originally trained on Human3.6M and tested here without retraining to assess their generalization. We introduce a protocol that simulates occlusion by adding noise into 2D keypoints based on real detector behavior, and conduct both global and per-joint sensitivity analyses. Our findings reveal that all models exhibit notable performance degradation under occlusion, with diffusion-based models underperforming despite their stochastic nature. Additionally, a per-joint occlusion analysis identifies consistent vulnerability in distal joints (e.g., wrists, feet) across models. Overall, this work highlights critical limitations of current 3D HPE models in handling occlusions, and provides insights for improving real-world robustness.

arxiv情報

著者	Filipa Lino,Carlos Santiago,Manuel Marques
発行日	2025-06-02 16:24:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Benchmarking 3D Human Pose Estimation Models under Occlusions

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー