C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition

要約

構成アクションは、動的 (動詞) 概念と静的 (オブジェクト) 概念で構成されます。
人間は学習した概念を使用して、目に見えない構図を簡単に認識できます。
機械の場合、このような問題を解決するには、以前に観察された動詞とオブジェクトで構成される目に見えないアクションをモデルが認識する必要があり、いわゆる構成的一般化能力が必要です。
この研究を促進するために、新しいゼロショット構成アクション認識 (ZS-CAR) タスクを提案します。
タスクを評価するために、広く使用されている Something-Something V2 データセットに基づいて、新しいベンチマーク Something-composition (Sth-com) を構築します。
また、新しい ZS-CAR タスクを解決するための新しい Component-to-Composition (C2C) 学習方法も提案します。
C2C には、独立コンポーネント学習モジュールと構成推論モジュールが含まれます。
最後に、目に見える構成と見えない構成の間のコンポーネントの変動という課題に対処し、見えるアクションと見えないアクションの学習の間の微妙なバランスを処理するための強化されたトレーニング戦略を考案します。
実験結果は、提案されたフレームワークが既存の構成一般化手法を大幅に上回り、新たな最先端を確立することを示しています。
新しい Sth-com ベンチマークとコードは https://github.com/RongchangLi/ZSCAR_C2C で入手できます。

要約(オリジナル)

Compositional actions consist of dynamic (verbs) and static (objects) concepts. Humans can easily recognize unseen compositions using the learned concepts. For machines, solving such a problem requires a model to recognize unseen actions composed of previously observed verbs and objects, thus requiring, so-called, compositional generalization ability. To facilitate this research, we propose a novel Zero-Shot Compositional Action Recognition (ZS-CAR) task. For evaluating the task, we construct a new benchmark, Something-composition (Sth-com), based on the widely used Something-Something V2 dataset. We also propose a novel Component-to-Composition (C2C) learning method to solve the new ZS-CAR task. C2C includes an independent component learning module and a composition inference module. Last, we devise an enhanced training strategy to address the challenges of component variation between seen and unseen compositions and to handle the subtle balance between learning seen and unseen actions. The experimental results demonstrate that the proposed framework significantly surpasses the existing compositional generalization methods and sets a new state-of-the-art. The new Sth-com benchmark and code are available at https://github.com/RongchangLi/ZSCAR_C2C.

arxiv情報

著者	Rongchang Li,Zhenhua Feng,Tianyang Xu,Linze Li,Xiao-Jun Wu,Muhammad Awais,Sara Atito,Josef Kittler
発行日	2024-07-08 16:49:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー