RAPID: Retrieval-Augmented Parallel Inference Drafting for Text-Based Video Event Retrieval

要約

マルチメディアコンテンツの急速な成長により、テキストクエリを使用してビデオからイベントを取得することがますます困難になっています。
テキストベースのビデオイベントの検索の既存の方法は、多くの場合、オブジェクトレベルの説明に重点を置いており、コンテキスト情報の重要な役割を見落としています。
この制限は、場所の詳細や曖昧な背景要素の欠落など、クエリに十分なコンテキストがない場合に特に明らかです。
これらの課題に対処するために、大規模な言語モデル（LLM）の進歩を活用し、関連するコンテキスト情報でユーザークエリを意味的に修正して充実させるための迅速な学習を活用するRapid（検索並列推論の起草）と呼ばれる新しいシステムを提案します。
これらの強化されたクエリは、並列検索を通じて処理され、その後、元のクエリとの整合に基づいて最も関連性の高い結果を選択するための評価ステップが続きます。
カスタム開発されたデータセットでの広範な実験を通じて、特にコンテキスト的に不完全なクエリのために、迅速に従来の検索方法を大幅に上回ることを実証します。
私たちのシステムは、Ho Chi Minh City AI Challenge 2024への参加を通じて速度と精度の両方を検証し、300時間以上のビデオからイベントを取得しました。
競争の主催者によって提案されたベースラインと迅速に比較するさらなる評価は、その優れた効果を示し、アプローチの強さと堅牢性を強調しました。

要約(オリジナル)

Retrieving events from videos using text queries has become increasingly challenging due to the rapid growth of multimedia content. Existing methods for text-based video event retrieval often focus heavily on object-level descriptions, overlooking the crucial role of contextual information. This limitation is especially apparent when queries lack sufficient context, such as missing location details or ambiguous background elements. To address these challenges, we propose a novel system called RAPID (Retrieval-Augmented Parallel Inference Drafting), which leverages advancements in Large Language Models (LLMs) and prompt-based learning to semantically correct and enrich user queries with relevant contextual information. These enriched queries are then processed through parallel retrieval, followed by an evaluation step to select the most relevant results based on their alignment with the original query. Through extensive experiments on our custom-developed dataset, we demonstrate that RAPID significantly outperforms traditional retrieval methods, particularly for contextually incomplete queries. Our system was validated for both speed and accuracy through participation in the Ho Chi Minh City AI Challenge 2024, where it successfully retrieved events from over 300 hours of video. Further evaluation comparing RAPID with the baseline proposed by the competition organizers demonstrated its superior effectiveness, highlighting the strength and robustness of our approach.

arxiv情報

著者	Long Nguyen,Huy Nguyen,Bao Khuu,Huy Luu,Huy Le,Tuan Nguyen,Tho Quan
発行日	2025-01-27 18:45:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

RAPID: Retrieval-Augmented Parallel Inference Drafting for Text-Based Video Event Retrieval

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー