View-Invariant Skeleton-based Action Recognition via Global-Local Contrastive Learning

要約

骨格ベースの人間の行動認識は、外観の変化に対する感度が低く、より多くの骨格データにアクセスできるため、最近関心が高まっています。
ただし、実際にキャプチャされた 3D スケルトンでさえ、視点と方向に敏感であり、さまざまな人体の関節が遮られ、人間の関節の位置特定にエラーが発生します。
このようなスケルトンデータのビューの分散は、行動認識のパフォーマンスに大きな影響を与える可能性があります。
この問題に対処するために、この論文では、スケルトンベースの人間の行動認識のために、手動の行動のラベル付けを行わない、新しいビュー不変表現学習アプローチを提案します。
具体的には、異なるビューから抽出された表現間の相互情報を最大化することにより、ネットワークトレーニングで同じ人物に対して同時に取得されたマルチビュースケルトンデータを活用し、グローバルローカルコントラスト損失を提案してマルチスケール co をモデル化します。
-空間ドメインと時間ドメインの両方における発生関係。
広範な実験結果は、提案された方法が入力スケルトンデータのビューの違いに対してロバストであり、教師なしスケルトンベースのヒューマンアクションメソッドのパフォーマンスを大幅に向上させ、2 つの挑戦的なマルチビューで新しい最先端の精度をもたらすことを示しています。
PKUMMD と NTU RGB+D のベンチマーク。

要約(オリジナル)

Skeleton-based human action recognition has been drawing more interest recently due to its low sensitivity to appearance changes and the accessibility of more skeleton data. However, even the 3D skeletons captured in practice are still sensitive to the viewpoint and direction gave the occlusion of different human-body joints and the errors in human joint localization. Such view variance of skeleton data may significantly affect the performance of action recognition. To address this issue, we propose in this paper a new view-invariant representation learning approach, without any manual action labeling, for skeleton-based human action recognition. Specifically, we leverage the multi-view skeleton data simultaneously taken for the same person in the network training, by maximizing the mutual information between the representations extracted from different views, and then propose a global-local contrastive loss to model the multi-scale co-occurrence relationships in both spatial and temporal domains. Extensive experimental results show that the proposed method is robust to the view difference of the input skeleton data and significantly boosts the performance of unsupervised skeleton-based human action methods, resulting in new state-of-the-art accuracies on two challenging multi-view benchmarks of PKUMMD and NTU RGB+D.

arxiv情報

著者	Cunling Bian,Wei Feng,Fanbo Meng,Song Wang
発行日	2022-09-23 15:00:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

View-Invariant Skeleton-based Action Recognition via Global-Local Contrastive Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー