TA2N: Two-Stage Action Alignment Network for Few-shot Action Recognition

要約

少数ショットアクション認識は、ほんの数個のサンプル (サポート) を使用して、新しいアクションクラス (クエリ) を認識することを目的としています。
現在のアプローチの大部分は、ビデオ間の類似性を比較することを学習するメトリック学習パラダイムに従います。
最近、この類似性を直接測定することは理想的ではないことが観察されました。これは、さまざまなアクションインスタンスが独特の時間分布を示す可能性があり、その結果、クエリビデオとサポートビデオ間で深刻なずれの問題が発生する可能性があるためです。
この論文では、この問題を 2 つの異なる側面、すなわち行動持続時間の不整合と行動進化の不整合から捉えます。
私たちは、2 段階アクションアラインメントネットワーク (TA2N) を通じて、これらの問題に順番に対処します。
最初の段階では、アクションに関係のない機能 (背景など) を無視しながら、各ビデオ機能をそのアクション期間にワープする時間アフィン変換を学習することによって、アクションの位置を特定します。
次に、第 2 段階では、時間的再配置と空間的オフセット予測を実行することにより、サポートの時空間アクション進化に一致するようにクエリ機能を調整します。
ベンチマークデータセットでの広範な実験により、提案された方法が少数ショットアクション認識の最先端のパフォーマンスを達成する可能性があることが示されています。このプロジェクトのコードは、https://github.com/R00Kie-Liu/ にあります。
TA2N

要約(オリジナル)

Few-shot action recognition aims to recognize novel action classes (query) using just a few samples (support). The majority of current approaches follow the metric learning paradigm, which learns to compare the similarity between videos. Recently, it has been observed that directly measuring this similarity is not ideal since different action instances may show distinctive temporal distribution, resulting in severe misalignment issues across query and support videos. In this paper, we arrest this problem from two distinct aspects — action duration misalignment and action evolution misalignment. We address them sequentially through a Two-stage Action Alignment Network (TA2N). The first stage locates the action by learning a temporal affine transform, which warps each video feature to its action duration while dismissing the action-irrelevant feature (e.g. background). Next, the second stage coordinates query feature to match the spatial-temporal action evolution of support by performing temporally rearrange and spatially offset prediction. Extensive experiments on benchmark datasets show the potential of the proposed method in achieving state-of-the-art performance for few-shot action recognition.The code of this project can be found at https://github.com/R00Kie-Liu/TA2N

arxiv情報

著者	Shuyuan Li,Huabin Liu,Rui Qian,Yuxi Li,John See,Mengjuan Fei,Xiaoyuan Yu,Weiyao Lin
発行日	2022-12-22 08:40:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

TA2N: Two-Stage Action Alignment Network for Few-shot Action Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー