Elevating Skeleton-Based Action Recognition with Efficient Multi-Modality Self-Supervision

要約

人間の行動認識のための自己教師あり表現学習は、近年急速に発展しています。
既存の作品のほとんどは、マルチモダリティ設定を使用しながらスケルトンデータに基づいています。
これらの研究では、モダリティ間のパフォーマンスの違いが見落とされており、関節、骨、動作などの 3 つの基本的なモダリティのみが使用されているため、追加のモダリティは検討されていません。
この研究では、まず、低パフォーマンスのモダリティ間での誤った知識の伝播を軽減する暗黙的知識交換モジュール (IKEM) を提案します。
次に、モダリティ間の補完情報を強化するための 3 つの新しいモダリティをさらに提案します。
最後に、新しいモダリティを導入する際の効率を維持するために、アンカー、ポジティブ、ネガティブによって制約される関係を考慮して、二次モダリティから必須モダリティに知識を抽出するための新しい教師と生徒のフレームワークを提案します。これは、リレーショナルクロスモダリティ知識蒸留と呼ばれます。
実験結果は、私たちのアプローチの有効性を実証し、スケルトンベースのマルチモダリティデータの効率的な使用を可能にします。
ソースコードは https://github.com/desehuileng0o0/IKEM で公開されます。

要約(オリジナル)

Self-supervised representation learning for human action recognition has developed rapidly in recent years. Most of the existing works are based on skeleton data while using a multi-modality setup. These works overlooked the differences in performance among modalities, which led to the propagation of erroneous knowledge between modalities while only three fundamental modalities, i.e., joints, bones, and motions are used, hence no additional modalities are explored. In this work, we first propose an Implicit Knowledge Exchange Module (IKEM) which alleviates the propagation of erroneous knowledge between low-performance modalities. Then, we further propose three new modalities to enrich the complementary information between modalities. Finally, to maintain efficiency when introducing new modalities, we propose a novel teacher-student framework to distill the knowledge from the secondary modalities into the mandatory modalities considering the relationship constrained by anchors, positives, and negatives, named relational cross-modality knowledge distillation. The experimental results demonstrate the effectiveness of our approach, unlocking the efficient use of skeleton-based multi-modality data. Source code will be made publicly available at https://github.com/desehuileng0o0/IKEM.

arxiv情報

著者	Yiping Wei,Kunyu Peng,Alina Roitberg,Jiaming Zhang,Junwei Zheng,Ruiping Liu,Yufan Chen,Kailun Yang,Rainer Stiefelhagen
発行日	2024-01-11 02:44:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Elevating Skeleton-Based Action Recognition with Efficient Multi-Modality Self-Supervision

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー