Video RWKV:Video Action Recognition Based RWKV

要約

CNN やトランスフォーマーなどの既存のビデオ理解方法における高い計算コストと長距離依存性の課題に対処するために、この研究では、新しい方法でビデオドメインに RWKV を導入します。
我々は、ビデオ理解タスクに取り組むための時空間表現学習用に設計された LSTM CrossRWKV (LCR) フレームワークを提案します。
具体的には、提案された線形複雑性 LCR には、現在のフレームエッジ情報と過去の特徴間の相互作用を促進する新しい Cross RWKV ゲートが組み込まれており、エッジ特徴を通じて被写体への焦点を強化し、時間の経過とともにフレーム間特徴をグローバルに集約します。
LCR は、拡張された LSTM 再実行メカニズムを通じてビデオ処理用の長期メモリを保存します。
Cross RWKV ゲートと反復実行を活用することで、LCR は空間的特徴と時間的特徴の両方を効果的にキャプチャします。
さらに、エッジ情報は LSTM の忘却ゲートとして機能し、長期的なメモリ管理をガイドします。チューブマスキング戦略により、食品内の冗長な情報が削減され、過剰適合が軽減されます。これらの利点により、LSTM CrossRWKV はビデオ理解における新しいベンチマークを設定し、スケーラブルなデータを提供します。
包括的なビデオ分析のための効率的なソリューション。
すべてのコードとモデルは公開されています。

要約(オリジナル)

To address the challenges of high computational costs and long-distance dependencies in exist ing video understanding methods, such as CNNs and Transformers, this work introduces RWKV to the video domain in a novel way. We propose a LSTM CrossRWKV (LCR) framework, designed for spatiotemporal representation learning to tackle the video understanding task. Specifically, the proposed linear complexity LCR incorporates a novel Cross RWKV gate to facilitate interaction be tween current frame edge information and past features, enhancing the focus on the subject through edge features and globally aggregating inter-frame features over time. LCR stores long-term mem ory for video processing through an enhanced LSTM recurrent execution mechanism. By leveraging the Cross RWKV gate and recurrent execution, LCR effectively captures both spatial and temporal features. Additionally, the edge information serves as a forgetting gate for LSTM, guiding long-term memory management.Tube masking strategy reduces redundant information in food and reduces overfitting.These advantages enable LSTM CrossRWKV to set a new benchmark in video under standing, offering a scalable and efficient solution for comprehensive video analysis. All code and models are publicly available.

arxiv情報

著者	Zhuowen Yin,Chengru Li,Xingbo Dong
発行日	2024-11-08 15:30:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Video RWKV:Video Action Recognition Based RWKV

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー