Spatial Hierarchy and Temporal Attention Guided Cross Masking for Self-supervised Skeleton-based Action Recognition

要約

自己教師ありスケルトンベースのアクション認識では、効果的なマスキングを通じてモデルの洗練とロバスト性を強化するマスク再構成パラダイムに関心が集まっています。
ただし、以前の研究では主に単一のマスキング基準に依存しており、その結果、モデルが特定の特徴を過剰適合させ、他の有効な情報が見落とされていました。
この論文では、空間的および時間的観点の両方からスケルトンシーケンスにマスキングを適用する、階層および注意誘導クロスマスキングフレームワーク (HA-CM) を紹介します。
具体的には、空間グラフでは、マスキング基準として関節階層を使用して、双曲空間を利用して関節の区別を維持し、高次元スケルトンの階層構造を効果的に保存します。
時間の流れでは、従来の距離メトリクスをマスキング用のジョイントのグローバルな注意に置き換え、高次元空間での距離の収束とグローバルな視点の欠如に対処します。
さらに、クロスマスキングフレームワークに基づくクロスコントラスト損失を損失関数に組み込み、インスタンスレベルの特徴のモデルの学習を強化します。
HA-CM は、3 つの公開大規模データセット、NTU-60、NTU-120、および PKU-MMD で効率性と普遍性を示します。
HA-CM のソースコードは https://github.com/yingxPeng/HA-CM-main で入手できます。

要約(オリジナル)

In self-supervised skeleton-based action recognition, the mask reconstruction paradigm is gaining interest in enhancing model refinement and robustness through effective masking. However, previous works primarily relied on a single masking criterion, resulting in the model overfitting specific features and overlooking other effective information. In this paper, we introduce a hierarchy and attention guided cross-masking framework (HA-CM) that applies masking to skeleton sequences from both spatial and temporal perspectives. Specifically, in spatial graphs, we utilize hyperbolic space to maintain joint distinctions and effectively preserve the hierarchical structure of high-dimensional skeletons, employing joint hierarchy as the masking criterion. In temporal flows, we substitute traditional distance metrics with the global attention of joints for masking, addressing the convergence of distances in high-dimensional space and the lack of a global perspective. Additionally, we incorporate cross-contrast loss based on the cross-masking framework into the loss function to enhance the model’s learning of instance-level features. HA-CM shows efficiency and universality on three public large-scale datasets, NTU-60, NTU-120, and PKU-MMD. The source code of our HA-CM is available at https://github.com/YinxPeng/HA-CM-main.

arxiv情報

著者	Xinpeng Yin,Wenming Cao
発行日	2024-09-26 15:28:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Spatial Hierarchy and Temporal Attention Guided Cross Masking for Self-supervised Skeleton-based Action Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー