QD-VMR: Query Debiasing with Contextual Understanding Enhancement for Video Moment Retrieval

要約

ビデオモーメント取得 (VMR) は、クエリに対応するトリミングされていないビデオの関連する瞬間を取得することを目的としています。
クロスモーダルインタラクションアプローチは、ビデオ内のクエリに無関係な情報をフィルタリングする点で進歩を示していますが、クエリセマンティクスと対応するビデオモーメントの間の正確な調整を前提としており、自然言語セマンティクスの誤解を見落としている可能性があります。
この課題に対処するために、\textit{QD-VMR} と呼ばれる新しいモデルを提案します。これは、文脈の理解を強化したクエリ偏り解消モデルです。
まず、ビデオクリップとクエリ機能の位置合わせとビデオクエリ対比学習を介して Global Partial Aligner モジュールを活用し、モデルのクロスモーダル理解機能を強化します。
その後、クエリデバイアスモジュールを使用してバイアスを除去したクエリ特徴を効率的に取得し、ビジュアルエンハンスメントモジュールを使用してクエリに関連するビデオ特徴を改良します。
最後に、DETR 構造を採用して、ターゲットとなる可能性のあるビデオの瞬間を予測します。
3 つのベンチマークデータセットの広範な評価を通じて、QD-VMR は最先端のパフォーマンスを実現し、VMR の精度を向上させる可能性を証明しました。
さらなる分析実験により、提案したモジュールの有効性が実証されます。
私たちのコードは、将来の研究を促進するために公開されます。

要約(オリジナル)

Video Moment Retrieval (VMR) aims to retrieve relevant moments of an untrimmed video corresponding to the query. While cross-modal interaction approaches have shown progress in filtering out query-irrelevant information in videos, they assume the precise alignment between the query semantics and the corresponding video moments, potentially overlooking the misunderstanding of the natural language semantics. To address this challenge, we propose a novel model called \textit{QD-VMR}, a query debiasing model with enhanced contextual understanding. Firstly, we leverage a Global Partial Aligner module via video clip and query features alignment and video-query contrastive learning to enhance the cross-modal understanding capabilities of the model. Subsequently, we employ a Query Debiasing Module to obtain debiased query features efficiently, and a Visual Enhancement module to refine the video features related to the query. Finally, we adopt the DETR structure to predict the possible target video moments. Through extensive evaluations of three benchmark datasets, QD-VMR achieves state-of-the-art performance, proving its potential to improve the accuracy of VMR. Further analytical experiments demonstrate the effectiveness of our proposed module. Our code will be released to facilitate future research.

arxiv情報

著者	Chenghua Gao,Min Li,Jianshuo Liu,Junxing Ren,Lin Chen,Haoyu Liu,Bo Meng,Jitao Fu,Wenwen Su
発行日	2024-08-23 10:56:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

QD-VMR: Query Debiasing with Contextual Understanding Enhancement for Video Moment Retrieval

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー