Exploring Iterative Refinement with Diffusion Models for Video Grounding

要約

ビデオグラウンディングは、特定の文のクエリに対応するトリミングされていないビデオ内のターゲットの瞬間を特定することを目的としています。
既存の方法は通常、事前定義された提案のセットから最良の予測を選択するか、単発方式でターゲットスパンを直接回帰するため、体系的な予測改良プロセスが欠如します。
この論文では、条件付き生成タスクとしてビデオグラウンディングを定式化する拡散モデルを備えた新しいフレームワークである DiffusionVG を提案します。このフレームワークでは、ターゲットスパンがガウスノイズ入力から生成され、逆拡散プロセスで対話的に洗練されます。
トレーニング中に、DiffusionVG は固定順拡散プロセスでターゲットスパンにノイズを徐々に追加し、逆拡散プロセスでターゲットスパンを回復する方法を学習します。
推論では、DiffusionVG は、ビデオセンテンス表現を条件とした学習された逆拡散プロセスにより、ガウスノイズ入力からターゲットスパンを生成できます。
余計な機能を追加せずに、当社の DiffusionVG は、主流の Charades-STA、ActivityNet Captions、および TACoS ベンチマークで、既存の精巧に作られたモデルと比較して優れたパフォーマンスを示します。

要約(オリジナル)

Video grounding aims to localize the target moment in an untrimmed video corresponding to a given sentence query. Existing methods typically select the best prediction from a set of predefined proposals or directly regress the target span in a single-shot manner, resulting in the absence of a systematical prediction refinement process. In this paper, we propose DiffusionVG, a novel framework with diffusion models that formulates video grounding as a conditional generation task, where the target span is generated from Gaussian noise inputs and interatively refined in the reverse diffusion process. During training, DiffusionVG progressively adds noise to the target span with a fixed forward diffusion process and learns to recover the target span in the reverse diffusion process. In inference, DiffusionVG can generate the target span from Gaussian noise inputs by the learned reverse diffusion process conditioned on the video-sentence representations. Without bells and whistles, our DiffusionVG demonstrates superior performance compared to existing well-crafted models on mainstream Charades-STA, ActivityNet Captions and TACoS benchmarks.

arxiv情報

著者	Xiao Liang,Tao Shi,Yaoyuan Liang,Te Tao,Shao-Lun Huang
発行日	2023-12-29 16:06:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Exploring Iterative Refinement with Diffusion Models for Video Grounding

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー