Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration

要約

効果的なビデオアクション認識モデルをトレーニングするには、特に限られたリソース予算の下で、重大な計算上の課題が生じます。
現在の手法は主に、モデルサイズを削減するか、事前トレーニングされたモデルを利用することを目的としており、さまざまなバックボーンアーキテクチャへの適応性が制限されています。
この論文では、多くのアプローチで一般的な問題であるにもかかわらず、比較的あまり注目されていない、オーバーサンプリングされたフレームの問題を調査します。
より少ないフレームの使用が潜在的な解決策であるにもかかわらず、このアプローチは多くの場合、パフォーマンスの大幅な低下をもたらします。
この問題に対処するために、我々は、まばらにサンプリングされた隣接する 2 つのビデオフレームの中間特徴を復元する新しい方法を提案します。
この特徴復元技術では、ViT などのリソースを大量に消費する画像エンコーダと比較して、計算要件の増加は無視できます。
私たちの方法の有効性を評価するために、Kinetics-400、ActivityNet、UCF-101、HMDB-51 を含む 4 つの公開データセットで広範な実験を実施します。
私たちの手法の統合により、一般的に使用される 3 つのベースラインの効率が 50% 以上向上しましたが、認識精度の低下はわずか 0.5% でした。
さらに、私たちの方法は、驚くべきことに、ゼロショット設定でのモデルの汎化能力を向上させるのにも役立ちます。

要約(オリジナル)

Training an effective video action recognition model poses significant computational challenges, particularly under limited resource budgets. Current methods primarily aim to either reduce model size or utilize pre-trained models, limiting their adaptability to various backbone architectures. This paper investigates the issue of over-sampled frames, a prevalent problem in many approaches yet it has received relatively little attention. Despite the use of fewer frames being a potential solution, this approach often results in a substantial decline in performance. To address this issue, we propose a novel method to restore the intermediate features for two sparsely sampled and adjacent video frames. This feature restoration technique brings a negligible increase in computational requirements compared to resource-intensive image encoders, such as ViT. To evaluate the effectiveness of our method, we conduct extensive experiments on four public datasets, including Kinetics-400, ActivityNet, UCF-101, and HMDB-51. With the integration of our method, the efficiency of three commonly used baselines has been improved by over 50%, with a mere 0.5% reduction in recognition accuracy. In addition, our method also surprisingly helps improve the generalization ability of the models under zero-shot settings.

arxiv情報

著者	Harry Cheng,Yangyang Guo,Liqiang Nie,Zhiyong Cheng,Mohan Kankanhalli
発行日	2023-07-27 13:52:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー