Weakly-Supervised Temporal Action Localization by Inferring Salient Snippet-Feature

要約

弱く監視された時間的アクション位置特定は、ビデオレベルのラベルのみを監視として使用することにより、トリミングされていないビデオ内のアクション領域を特定し、同時にアクションカテゴリを識別することを目的としています。
擬似ラベル生成は、この困難な問題を解決するための有望な戦略ですが、現在の方法は、そのような生成プロセスを支援する豊富な情報を提供できるビデオの自然な時間構造を無視しています。
本稿では、顕著なスニペット特徴を推論することにより、新しい弱教師付き時間アクション位置特定手法を提案します。
まず、時間的に隣接するスニペット間の変動関係を利用して、ビデオ内の重要な動的変化を反映できる顕著なスニペットの特徴を発見する顕著性推論モジュールを設計します。
次に、情報対話ユニットを通じて顕著なスニペットの特徴を強化する境界改良モジュールを導入します。
次に、スニペットの特徴の識別性を強化するために、識別強化モジュールが導入されます。
最後に、洗練されたスニペット機能を採用して、アクション位置特定ネットワークのトレーニングを監視するために使用できる忠実度の高い疑似ラベルを生成します。
2 つの公的に利用可能なデータセット、つまり THUMOS14 と ActivityNet v1.3 での広範な実験により、私たちの提案した方法が最先端の方法と比較して大幅な改善を達成することが実証されました。

要約(オリジナル)

Weakly-supervised temporal action localization aims to locate action regions and identify action categories in untrimmed videos simultaneously by taking only video-level labels as the supervision. Pseudo label generation is a promising strategy to solve the challenging problem, but the current methods ignore the natural temporal structure of the video that can provide rich information to assist such a generation process. In this paper, we propose a novel weakly-supervised temporal action localization method by inferring salient snippet-feature. First, we design a saliency inference module that exploits the variation relationship between temporal neighbor snippets to discover salient snippet-features, which can reflect the significant dynamic change in the video. Secondly, we introduce a boundary refinement module that enhances salient snippet-features through the information interaction unit. Then, a discrimination enhancement module is introduced to enhance the discriminative nature of snippet-features. Finally, we adopt the refined snippet-features to produce high-fidelity pseudo labels, which could be used to supervise the training of the action localization network. Extensive experiments on two publicly available datasets, i.e., THUMOS14 and ActivityNet v1.3, demonstrate our proposed method achieves significant improvements compared to the state-of-the-art methods.

arxiv情報

著者	Wulian Yun,Mengshi Qi,Chuanming Wang,Huadong Ma
発行日	2023-12-20 14:08:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Weakly-Supervised Temporal Action Localization by Inferring Salient Snippet-Feature

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー