Diversified Augmentation with Domain Adaptation for Debiased Video Temporal Grounding

要約

ビデオにおける時間的文のグラウンディング (TSGV) は、ターゲットの瞬間の不均一な時間的分布に起因する重大な時間的バイアスを含む公開 TSGV データセットによる課題に直面しています。
既存の方法では、ターゲットの瞬間に強制的にさまざまな時間的位置を持たせる拡張ビデオを生成します。
ただし、特定のデータセットのビデオの長さの変動は小さいため、時間的位置を変更するだけでは、長さが異なるビデオの汎化能力が低くなります。
この論文では、多様なデータ拡張とドメイン弁別器によって補完された新しいトレーニングフレームワークを提案します。
データ拡張により、時間的分布を多様化するために、さまざまな長さとターゲットの瞬間の位置を持つビデオが生成されます。
ただし、拡張ビデオは必然的に独特の特徴分布を示し、ノイズが発生する可能性があります。
これに対処するために、オリジナルのビデオと拡張されたビデオの間の機能の不一致を軽減するドメイン適応補助タスクを設計します。
また、偏りのないトレーニングを促進するために、同じテキストクエリを持つが瞬間の位置が異なるビデオに対して、モデルが異なる予測を生成することも推奨します。
Charades-CD および ActivityNet-CD データセットの実験は、複数の接地構造における私たちの方法の有効性と一般化能力を実証し、最先端の結果を達成します。

要約(オリジナル)

Temporal sentence grounding in videos (TSGV) faces challenges due to public TSGV datasets containing significant temporal biases, which are attributed to the uneven temporal distributions of target moments. Existing methods generate augmented videos, where target moments are forced to have varying temporal locations. However, since the video lengths of the given datasets have small variations, only changing the temporal locations results in poor generalization ability in videos with varying lengths. In this paper, we propose a novel training framework complemented by diversified data augmentation and a domain discriminator. The data augmentation generates videos with various lengths and target moment locations to diversify temporal distributions. However, augmented videos inevitably exhibit distinct feature distributions which may introduce noise. To address this, we design a domain adaptation auxiliary task to diminish feature discrepancies between original and augmented videos. We also encourage the model to produce distinct predictions for videos with the same text queries but different moment locations to promote debiased training. Experiments on Charades-CD and ActivityNet-CD datasets demonstrate the effectiveness and generalization abilities of our method in multiple grounding structures, achieving state-of-the-art results.

arxiv情報

著者	Junlong Ren,Gangjian Zhang,Haifeng Sun,Hao Wang
発行日	2025-01-14 14:40:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Diversified Augmentation with Domain Adaptation for Debiased Video Temporal Grounding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー