Constraint and Union for Partially-Supervised Temporal Sentence Grounding

要約

テンポラルセンテンスグラウンディングは、トリミングされていない特定の動画から、自然言語クエリによって記述されたイベントタイムスタンプを検出することを目的としています。
既存の完全に監視された設定は優れたパフォーマンスを実現しますが、高価な注釈コストが必要です。
一方、教師が弱い設定では安価なラベルが採用されますが、パフォーマンスは低くなります。
より少ないアノテーションコストで高いパフォーマンスを追求するために、このホワイトペーパーでは中間の部分教師あり設定を導入します。つまり、トレーニング中にはショートクリップまたは単一フレームラベルのみを使用できます。
部分ラベルを最大限に活用するために、サンプル内およびサンプル間、ユニモダリティおよびマルチモダリティをカバーする、イベントクエリに合わせた表現を包括的に形成する新しい 4 つの制約パイプラインを提案します。
前者は、クラスター内のコンパクト性とクラスター間の分離性を高めます。
後者は、イベントバックグラウンドの分離とイベントクエリの収集を有効にします。
明示的なグラウンディングの最適化でより強力なパフォーマンスを実現するために、部分フルユニオンフレームワークをさらに導入します。つまり、完全に監視されたブランチを追加してブリッジし、その印象的なグラウンディングボーナスを享受し、部分的な注釈に対して堅牢になります。
Charades-STA と ActivityNet キャプションでの広範な実験とアブレーションは、部分的な監視の重要性と優れたパフォーマンスを示しています。

要約(オリジナル)

Temporal sentence grounding aims to detect the event timestamps described by the natural language query from given untrimmed videos. The existing fully-supervised setting achieves great performance but requires expensive annotation costs; while the weakly-supervised setting adopts cheap labels but performs poorly. To pursue high performance with less annotation cost, this paper introduces an intermediate partially-supervised setting, i.e., only short-clip or even single-frame labels are available during training. To take full advantage of partial labels, we propose a novel quadruple constraint pipeline to comprehensively shape event-query aligned representations, covering intra- and inter-samples, uni- and multi-modalities. The former raises intra-cluster compactness and inter-cluster separability; while the latter enables event-background separation and event-query gather. To achieve more powerful performance with explicit grounding optimization, we further introduce a partial-full union framework, i.e., bridging with an additional fully-supervised branch, to enjoy its impressive grounding bonus, and be robust to partial annotations. Extensive experiments and ablations on Charades-STA and ActivityNet Captions demonstrate the significance of partial supervision and our superior performance.

arxiv情報

著者	Chen Ju,Haicheng Wang,Jinxiang Liu,Chaofan Ma,Ya Zhang,Peisen Zhao,Jianlong Chang,Qi Tian
発行日	2023-02-20 09:14:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Constraint and Union for Partially-Supervised Temporal Sentence Grounding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー