CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation

要約

3D 動作認識では、スケルトンモダリティ間に豊富な補完情報が存在します。
それにもかかわらず、この情報をどのようにモデル化して利用するかは、自己教師ありの 3D アクション表現学習にとって依然として困難な問題です。
この研究では、クロスモーダル相互作用を双方向の知識蒸留問題として定式化します。
固定の事前トレーニングを受けた教師の知識を生徒に伝える古典的な抽出ソリューションとは異なり、この取り組みでは、知識は継続的に更新され、モダリティ間で双方向に抽出されます。
この目的を達成するために、次の設計を備えた新しいクロスモーダル相互蒸留 (CMD) フレームワークを提案します。
一方で、隣接類似度分布は、各モダリティで学習された知識をモデル化するために導入され、関係情報は対照的なフレームワークに自然に適しています。
一方、非対称構成は、教師と生徒が蒸留プロセスを安定させ、モダリティ間で信頼性の高い情報を転送するために使用されます。
派生により、以前の研究におけるクロスモーダルポジティブマイニングは、CMD の退化バージョンと見なすことができることがわかります。
NTU RGB+D 60、NTU RGB+D 120、および PKU-MMD II データセットに対して広範な実験を実行します。
私たちのアプローチは既存の自己監視型手法を上回り、一連の新記録を樹立しました。
コードはhttps://github.com/maoyunyao/CMDから入手できます。

要約(オリジナル)

In 3D action recognition, there exists rich complementary information between skeleton modalities. Nevertheless, how to model and utilize this information remains a challenging problem for self-supervised 3D action representation learning. In this work, we formulate the cross-modal interaction as a bidirectional knowledge distillation problem. Different from classic distillation solutions that transfer the knowledge of a fixed and pre-trained teacher to the student, in this work, the knowledge is continuously updated and bidirectionally distilled between modalities. To this end, we propose a new Cross-modal Mutual Distillation (CMD) framework with the following designs. On the one hand, the neighboring similarity distribution is introduced to model the knowledge learned in each modality, where the relational information is naturally suitable for the contrastive frameworks. On the other hand, asymmetrical configurations are used for teacher and student to stabilize the distillation process and to transfer high-confidence information between modalities. By derivation, we find that the cross-modal positive mining in previous works can be regarded as a degenerated version of our CMD. We perform extensive experiments on NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD II datasets. Our approach outperforms existing self-supervised methods and sets a series of new records. The code is available at: https://github.com/maoyunyao/CMD

arxiv情報

著者	Yunyao Mao,Wengang Zhou,Zhenbo Lu,Jiajun Deng,Houqiang Li
発行日	2023-05-25 14:19:43+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー