AggPose: Deep Aggregation Vision Transformer for Infant Pose Estimation

要約

新生児の運動と姿勢の評価は、経験豊富な小児科医が神経発達障害を予測し、関連する疾患への早期介入を可能にします。しかし、人間のポーズ推定法のための最新のAIアプローチのほとんどは成人に焦点を当てており、乳児のポーズ推定に関する公のベンチマークが不足しています。本論文では、このギャップを埋めるために、幼児ポーズデータセットと人間のポーズ推定用のDeep Aggregation Vision Transformerを提案する。これは、初期段階で特徴を抽出するために畳み込み演算を用いずに高速に訓練された完全な変換器のフレームワークを導入している。これは、Transformer + MLPを特徴マップ内の高解像度深層集約に一般化し、異なる視覚レベル間の情報融合を可能にするものである。我々は、COCOポーズデータセットでAggPoseを事前学習し、新しくリリースされた大規模な幼児ポーズ推定データセットに適用した。その結果、AggPoseは異なる解像度間のマルチスケール特徴を効果的に学習し、乳児の姿勢推定の性能を大幅に向上させることができた。また、幼児ポーズ推定データセットにおいて、AggPoseはハイブリッドモデルHRFormerやTokenPoseを上回る性能を持つことを示す。さらに、COCOバルポーズ推定において、AggPoseはHRFormerを平均で0.8AP上回る性能を示した。我々のコードはgithub.com/SZAR-LAB/AggPoseで公開されています。

要約(オリジナル)

Movement and pose assessment of newborns lets experienced pediatricians predict neurodevelopmental disorders, allowing early intervention for related diseases. However, most of the newest AI approaches for human pose estimation methods focus on adults, lacking publicly benchmark for infant pose estimation. In this paper, we fill this gap by proposing infant pose dataset and Deep Aggregation Vision Transformer for human pose estimation, which introduces a fast trained full transformer framework without using convolution operations to extract features in the early stages. It generalizes Transformer + MLP to high-resolution deep layer aggregation within feature maps, thus enabling information fusion between different vision levels. We pre-train AggPose on COCO pose dataset and apply it on our newly released large-scale infant pose estimation dataset. The results show that AggPose could effectively learn the multi-scale features among different resolutions and significantly improve the performance of infant pose estimation. We show that AggPose outperforms hybrid model HRFormer and TokenPose in the infant pose estimation dataset. Moreover, our AggPose outperforms HRFormer by 0.8 AP on COCO val pose estimation on average. Our code is available at github.com/SZAR-LAB/AggPose.

arxiv情報

著者	Xu Cao,Xiaoye Li,Liya Ma,Yi Huang,Xuan Feng,Zening Chen,Hongwu Zeng,Jianguo Cao
発行日	2022-08-10 03:05:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

AggPose: Deep Aggregation Vision Transformer for Infant Pose Estimation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー