RelaMiX: Exploring Few-Shot Adaptation in Video-based Action Recognition

要約

ドメイン適応は、さまざまな環境、センサーの種類、データソースにわたって正確かつ堅牢なパフォーマンスを保証するアクティビティ認識に不可欠です。
教師なしドメイン適応方法は広く研究されていますが、ターゲットドメインからの大規模なラベルなしデータが必要です。
この研究では、ビデオベースのアクティビティ認識のための少数ショットドメイン適応 (FSDA-AR) に取り組みます。これは、非常に少量のラベル付きターゲットビデオを活用して、効果的な適応を実現します。
この設定は、まれではあるが認識することが重要なアクティビティを含むことが多い、ターゲットドメイン内のクラスごとにほんの数例、または 1 つの例だけを記録してラベル付けする必要があるため、アプリケーションにとって魅力的で有望です。
さまざまなドメインタイプを考慮した 5 つの確立されたデータセット (UCF101、HMDB51、EPIC-KITCHEN、Sims4Action、ToyotaSmartHome) を使用して FSDA-AR ベンチマークを構築します。
私たちの結果は、FSDA-AR が、ターゲットドメインサンプルが大幅に少ない (まだラベル付けされている) 場合でも、教師なしドメイン適応と同等に機能することを示しています。
さらに、少数のラベル付きターゲットドメインサンプルを知識ガイダンスとしてより効果的に活用するための、新しいアプローチである RelaMiX を提案します。
RelaMiX には、クロスドメイン情報調整メカニズムと並行して、リレーションドロップアウトを備えた時間的リレーショナルアテンションネットワークが含まれています。
さらに、少数ショットのターゲットドメインサンプルを使用して、潜在空間内で特徴を混合するためのメカニズムが統合されています。
提案された RelaMiX ソリューションは、FSDA-AR ベンチマーク内のすべてのデータセットで最先端のパフォーマンスを達成します。
ビデオベースのアクティビティ認識のための少数ショットドメイン適応の将来の研究を促進するために、ベンチマークとソースコードは https://github.com/KPeng9510/RelaMiX で公開されています。

要約(オリジナル)

Domain adaptation is essential for activity recognition to ensure accurate and robust performance across diverse environments, sensor types, and data sources. Unsupervised domain adaptation methods have been extensively studied, yet, they require large-scale unlabeled data from the target domain. In this work, we address Few-Shot Domain Adaptation for video-based Activity Recognition (FSDA-AR), which leverages a very small amount of labeled target videos to achieve effective adaptation. This setting is attractive and promising for applications, as it requires recording and labeling only a few, or even a single example per class in the target domain, which often includes activities that are rare yet crucial to recognize. We construct FSDA-AR benchmarks using five established datasets considering diverse domain types: UCF101, HMDB51, EPIC-KITCHEN, Sims4Action, and ToyotaSmartHome. Our results demonstrate that FSDA-AR performs comparably to unsupervised domain adaptation with significantly fewer (yet labeled) target domain samples. We further propose a novel approach, RelaMiX, to better leverage the few labeled target domain samples as knowledge guidance. RelaMiX encompasses a temporal relational attention network with relation dropout, alongside a cross-domain information alignment mechanism. Furthermore, it integrates a mechanism for mixing features within a latent space by using the few-shot target domain samples. The proposed RelaMiX solution achieves state-of-the-art performance on all datasets within the FSDA-AR benchmark. To encourage future research of few-shot domain adaptation for video-based activity recognition, our benchmarks and source code are made publicly available at https://github.com/KPeng9510/RelaMiX.

arxiv情報

著者	Kunyu Peng,Di Wen,David Schneider,Jiaming Zhang,Kailun Yang,M. Saquib Sarfraz,Rainer Stiefelhagen,Alina Roitberg
発行日	2023-10-28 12:00:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

RelaMiX: Exploring Few-Shot Adaptation in Video-based Action Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー