Automatic infant 2D pose estimation from videos: comparing seven deep neural network methods

要約

通常のビデオから幼児の姿勢と動作をマーカーレスで自動的に推定することは、「自然環境」での動作研究に大きな可能性をもたらし、運動発達の理解を促進し、疾患の早期診断の可能性を大幅に高めます。
深層学習と機械学習の進歩により、コンピュータービジョンにおける人間の姿勢推定手法が急速に開発されています。
ただし、これらの手法は、さまざまな状況にある成人を特徴とするデータセットでトレーニングされています。
この研究では、仰臥位の乳児のビデオで 7 つの一般的な方法 (AlphaPose、DeepLabCut/DeeperCut、Detectron2、HRNet、MediaPipe/BlazePose、OpenPose、および ViTPose) をテストし、比較します。
驚くべきことに、DeepLabCut と MediaPipe を除くすべてのメソッドは、追加の微調整なしでも同等のパフォーマンスを発揮し、ViTPose が最高のパフォーマンスを発揮します。
標準的なパフォーマンス指標 (オブジェクトのキーポイントの類似性、平均精度、再現率) の次に、首-中腰-腰の比率で表される誤差を導入し、さらに、検出の見逃しと冗長性、およびさまざまな方法の内部信頼度の信頼性を研究します。
下流のタスクに関連します。
競争力のあるパフォーマンスを備えたネットワークの中で、私たちのマシンでほぼリアルタイム (27 fps) で実行できたのは AlphaPose だけでした。
使用したすべてのメソッド、分析スクリプト、および処理されたデータに関する文書化された Docker コンテナーまたは手順を https://hub.docker.com/u/humanoidsctu および https://osf.io/x465b/ で提供します。

要約(オリジナル)

Automatic markerless estimation of infant posture and motion from ordinary videos carries great potential for movement studies ‘in the wild’, facilitating understanding of motor development and massively increasing the chances of early diagnosis of disorders. There is rapid development of human pose estimation methods in computer vision thanks to advances in deep learning and machine learning. However, these methods are trained on datasets featuring adults in different contexts. This work tests and compares seven popular methods (AlphaPose, DeepLabCut/DeeperCut, Detectron2, HRNet, MediaPipe/BlazePose, OpenPose, and ViTPose) on videos of infants in supine position. Surprisingly, all methods except DeepLabCut and MediaPipe have competitive performance without additional finetuning, with ViTPose performing best. Next to standard performance metrics (object keypoint similarity, average precision and recall), we introduce errors expressed in the neck-mid-hip ratio and additionally study missed and redundant detections and the reliability of the internal confidence ratings of the different methods, which are relevant for downstream tasks. Among the networks with competitive performance, only AlphaPose could run close to real time (27 fps) on our machine. We provide documented Docker containers or instructions for all the methods we used, our analysis scripts, and processed data at https://hub.docker.com/u/humanoidsctu and https://osf.io/x465b/.

arxiv情報

著者	Filipe Gama,Matej Misar,Lukas Navara,Sergiu T. Popescu,Matej Hoffmann
発行日	2024-06-27 14:59:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Automatic infant 2D pose estimation from videos: comparing seven deep neural network methods

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー