Adaptive Perception Transformer for Temporal Action Localization

要約

一時的なアクションローカリゼーションは、トリミングされていない長いビデオの各アクションインスタンスの境界とカテゴリを予測することを目的としています。
アンカーまたはプロポーザルに基づく以前の方法のほとんどは、ビデオシーケンス全体におけるグローバルローカルコンテキストの相互作用を無視しています。
その上、彼らの多段階設計は、アクションの境界とカテゴリを直接生成することはできません。
上記の問題に対処するために、このホワイトペーパーでは、Adaptive Perception トランスフォーマー (略して AdaPerFormer) と呼ばれるエンドツーエンドモデルを提案します。
具体的には、AdaPerFormer はデュアルブランチアテンションメカニズムを調査します。
1 つのブランチがグローバルな知覚注意を処理し、ビデオシーケンス全体をモデル化し、グローバルな関連コンテキストを集約できます。
もう一方のブランチはローカル畳み込みシフトに集中し、双方向シフト操作を通じてフレーム内およびフレーム間情報を集約します。
エンドツーエンドの性質により、追加の手順なしでビデオアクションの境界とカテゴリが生成されます。
私たちの設計の有効性を明らかにするために、アブレーション研究と一緒に広範な実験が提供されています。
私たちの方法は、THUMOS14およびActivityNet-1.3データセットで競争力のあるパフォーマンスを得ています。

要約(オリジナル)

Temporal action localization aims to predict the boundary and category of each action instance in untrimmed long videos. Most of previous methods based on anchors or proposals neglect the global-local context interaction in entire video sequences. Besides, their multi-stage designs cannot generate action boundaries and categories straightforwardly. To address the above issues, this paper proposes a end-to-end model, called Adaptive Perception transformer (AdaPerFormer for short). Specifically, AdaPerFormer explores a dual-branch attention mechanism. One branch takes care of the global perception attention, which can model entire video sequences and aggregate global relevant contexts. While the other branch concentrates on the local convolutional shift to aggregate intra-frame and inter-frame information through our bidirectional shift operation. The end-to-end nature produces the boundaries and categories of video actions without extra steps. Extensive experiments together with ablation studies are provided to reveal the effectiveness of our design. Our method obtains competitive performance on the THUMOS14 and ActivityNet-1.3 dataset.

arxiv情報

著者	Yizheng Ouyang,Tianjin Zhang,Weibo Gu,Hongfa Wang
発行日	2022-09-15 13:30:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Adaptive Perception Transformer for Temporal Action Localization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー