Contrast-Unity for Partially-Supervised Temporal Sentence Grounding

要約

一時的な文の基礎は、与えられた非トリミングされていないビデオからの自然言語クエリによって記述されたイベントタイムスタンプを検出することを目的としています。
既存の完全に監視された設定は素晴らしい結果を達成しますが、高価な注釈コストが必要です。
弱く監視されている設定では、安価なラベルを採用していますが、パフォーマンスが低下します。
注釈コストが少ない高性能を追求するために、このペーパーでは、中間の部分的に監視された設定を紹介します。つまり、トレーニング中に短クリップのみが利用できます。
部分的なラベルを最大限に活用するために、暗黙的に優れた進行性の接地という2段階の目標とともに、1つのコントラストユニティフレームワークを特別に設計します。
暗黙の段階では、包括的な4倍の対照学習を使用して、イベントクォーリーの対照学習を使用して、イベントクエリ表現を細かく整列させます。
次に、高品質の表現は、許容可能な接地擬似ラベルをもたらします。
明示的な段階では、基地目標を明示的に最適化するために、洗練と除去のために得られた擬似ラベルを使用して、1つの完全に監視されたモデルをトレーニングします。
Charades-staとActivityNetのキャプションに関する広範な実験と徹底的なアブレーションは、部分的な監督の重要性と優れたパフォーマンスを示しています。

要約(オリジナル)

Temporal sentence grounding aims to detect event timestamps described by the natural language query from given untrimmed videos. The existing fully-supervised setting achieves great results but requires expensive annotation costs; while the weakly-supervised setting adopts cheap labels but performs poorly. To pursue high performance with less annotation costs, this paper introduces an intermediate partially-supervised setting, i.e., only short-clip is available during training. To make full use of partial labels, we specially design one contrast-unity framework, with the two-stage goal of implicit-explicit progressive grounding. In the implicit stage, we align event-query representations at fine granularity using comprehensive quadruple contrastive learning: event-query gather, event-background separation, intra-cluster compactness and inter-cluster separability. Then, high-quality representations bring acceptable grounding pseudo-labels. In the explicit stage, to explicitly optimize grounding objectives, we train one fully-supervised model using obtained pseudo-labels for grounding refinement and denoising. Extensive experiments and thoroughly ablations on Charades-STA and ActivityNet Captions demonstrate the significance of partial supervision, as well as our superior performance.

arxiv情報

著者	Haicheng Wang,Chen Ju,Weixiong Lin,Chaofan Ma,Shuai Xiao,Ya Zhang,Yanfeng Wang
発行日	2025-02-18 14:59:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Contrast-Unity for Partially-Supervised Temporal Sentence Grounding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー