Rethinking Self-Supervised Visual Representation Learning in Pre-training for 3D Human Pose and Shape Estimation

要約

最近、いくつかの自己教師あり表現学習 (SSL) メソッドが、オブジェクト検出などのビジョンタスクの ImageNet 分類事前トレーニングよりも優れています。
ただし、3D の人体の姿勢と形状の推定 (3DHPSE) に対するその効果は疑問の余地があり、そのターゲットは固有のクラスである人間に固定されており、SSL との固有のタスクギャップがあります。
SSL の効果を経験的に調査および分析し、さらに 3DHPSE の他の事前トレーニングの選択肢と比較します。
代替手段は、ラベル付けコストの削減を目的とした SSL の動機を共有する、2D アノテーションベースの事前トレーニングと合成データの事前トレーニングです。
それらは、弱い監督または微調整のソースとして広く利用されてきましたが、事前トレーニングのソースとしては注目されていません。
SSL メソッドは、複数の 3DHPSE ベンチマークでの従来の ImageNet 分類事前トレーニングよりも平均 7.7% 劣っています。
対照的に、事前トレーニングデータの量がはるかに少ないにもかかわらず、2D 注釈ベースの事前トレーニングでは、すべてのベンチマークで精度が向上し、微調整中の収束が速くなります。
私たちの観察は、現在の SSL 事前トレーニングを 3DHPSE に単純に適用することに挑戦し、事前トレーニングの側面における他のデータ型の価値を再認識させます。

要約(オリジナル)

Recently, a few self-supervised representation learning (SSL) methods have outperformed the ImageNet classification pre-training for vision tasks such as object detection. However, its effects on 3D human body pose and shape estimation (3DHPSE) are open to question, whose target is fixed to a unique class, the human, and has an inherent task gap with SSL. We empirically study and analyze the effects of SSL and further compare it with other pre-training alternatives for 3DHPSE. The alternatives are 2D annotation-based pre-training and synthetic data pre-training, which share the motivation of SSL that aims to reduce the labeling cost. They have been widely utilized as a source of weak-supervision or fine-tuning, but have not been remarked as a pre-training source. SSL methods underperform the conventional ImageNet classification pre-training on multiple 3DHPSE benchmarks by 7.7% on average. In contrast, despite a much less amount of pre-training data, the 2D annotation-based pre-training improves accuracy on all benchmarks and shows faster convergence during fine-tuning. Our observations challenge the naive application of the current SSL pre-training to 3DHPSE and relight the value of other data types in the pre-training aspect.

arxiv情報

著者	Hongsuk Choi,Hyeongjin Nam,Taeryung Lee,Gyeongsik Moon,Kyoung Mu Lee
発行日	2023-03-09 16:17:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Rethinking Self-Supervised Visual Representation Learning in Pre-training for 3D Human Pose and Shape Estimation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー