Towards High Performance One-Stage Human Pose Estimation

要約

トップダウンでの人間の姿勢推定手法の高性能化と高効率化は魅力的です。
マスク RCNN は、バックボーンによって提供される機能を 2 つのタスクで共有できるため、1 つのフレームワークで人物検出と姿勢推定を実行することで効率を大幅に向上させることができます。
ただし、パフォーマンスは従来の 2 段階の方法ほど良くありません。
この論文では、Mask-RCNN の人間の姿勢推定結果を大幅に改善し、効率を維持することを目指しています。
具体的には、特徴抽出とキーポイント検出を含むポーズ推定のプロセス全体を改善します。
特徴抽出の部分は、ポーズの十分かつ貴重な情報を取得するために保証されています。
次に、グローバルコンテキストモジュールをキーポイント検出ブランチに導入して、受容野を拡大します。これは、人間の姿勢推定を成功させるために不可欠です。
COCO val2017 セットでは、ResNet-50 バックボーンを使用するモデルは 68.1 の AP を達成し、これは Mask RCNN (65.5 の AP) よりも 2.6 高いです。
従来の 2 段階のトップダウン方式である SimpleBaseline と比較して、モデルはパフォーマンスギャップを大幅に縮小し (68.1 AP 対 68.9 AP)、はるかに高速な推論速度 (77 ミリ秒対 168 ミリ秒) であり、提案された方法の有効性を示しています。
.
コードは https://github.com/lingl_space/maskrcnn_keypoint_refined で入手できます。

要約(オリジナル)

Making top-down human pose estimation method present both good performance and high efficiency is appealing. Mask RCNN can largely improve the efficiency by conducting person detection and pose estimation in a single framework, as the features provided by the backbone are able to be shared by the two tasks. However, the performance is not as good as traditional two-stage methods. In this paper, we aim to largely advance the human pose estimation results of Mask-RCNN and still keep the efficiency. Specifically, we make improvements on the whole process of pose estimation, which contains feature extraction and keypoint detection. The part of feature extraction is ensured to get enough and valuable information of pose. Then, we introduce a Global Context Module into the keypoints detection branch to enlarge the receptive field, as it is crucial to successful human pose estimation. On the COCO val2017 set, our model using the ResNet-50 backbone achieves an AP of 68.1, which is 2.6 higher than Mask RCNN (AP of 65.5). Compared to the classic two-stage top-down method SimpleBaseline, our model largely narrows the performance gap (68.1 AP vs. 68.9 AP) with a much faster inference speed (77 ms vs. 168 ms), demonstrating the effectiveness of the proposed method. Code is available at: https://github.com/lingl_space/maskrcnn_keypoint_refined.

arxiv情報

著者	Ling Li,Lin Zhao,Linhao Xu,Jie Xu
発行日	2023-01-12 07:02:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards High Performance One-Stage Human Pose Estimation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー