Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding

要約

時間的グラウンディングは、トリミングされていないビデオ内の特定のセンテンスクエリに意味的に対応するターゲットビデオモーメントを特定することを目的としています。
ただし、最近の研究では、既存の方法が深刻な時間的バイアスの問題を抱えていることがわかりました。
これらの方法は、視覚とテキストのセマンティックアラインメントに基づいてターゲットモーメントの位置を推論するのではなく、トレーニングセット内のクエリの時間的バイアスに過度に依存します。
この目的のために、この論文では、グラウンディング精度を失うことなく、シャッフルされたビデオを使用して時間的バイアスの問題に対処するグラウンディングモデルの新しいトレーニングフレームワークを提案します。
私たちのフレームワークは、接地モデルのトレーニングを促進するために、クロスモーダルマッチングと時間的順序識別という 2 つの補助タスクを導入しています。
クロスモーダルマッチングタスクは、シャッフルされたビデオと元のビデオの間のコンテンツの一貫性を活用して、グラウンディングモデルにビジュアルコンテンツをマイニングさせ、クエリを意味的に一致させます。
時間的順序識別タスクは、時間的順序の違いを活用して、長期的な時間的コンテキストの理解を強化します。
Charades-STA と ActivityNet Captions に関する広範な実験は、時間的バイアスへの依存を軽減し、さまざまな時間的分布に対するモデルの一般化能力を強化するための方法の有効性を示しています。
コードは https://github.com/haojc/ShufflingVideosForTSG で入手できます。

要約(オリジナル)

Temporal grounding aims to locate a target video moment that semantically corresponds to the given sentence query in an untrimmed video. However, recent works find that existing methods suffer a severe temporal bias problem. These methods do not reason the target moment locations based on the visual-textual semantic alignment but over-rely on the temporal biases of queries in training sets. To this end, this paper proposes a novel training framework for grounding models to use shuffled videos to address temporal bias problem without losing grounding accuracy. Our framework introduces two auxiliary tasks, cross-modal matching and temporal order discrimination, to promote the grounding model training. The cross-modal matching task leverages the content consistency between shuffled and original videos to force the grounding model to mine visual contents to semantically match queries. The temporal order discrimination task leverages the difference in temporal order to strengthen the understanding of long-term temporal contexts. Extensive experiments on Charades-STA and ActivityNet Captions demonstrate the effectiveness of our method for mitigating the reliance on temporal biases and strengthening the model’s generalization ability against the different temporal distributions. Code is available at https://github.com/haojc/ShufflingVideosForTSG.

arxiv情報

著者	Jiachang Hao,Haifeng Sun,Pengfei Ren,Jingyu Wang,Qi Qi,Jianxin Liao
発行日	2022-07-29 14:11:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー