Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark

要約

2023 年版の成功に続き、最先端のビデオモデルのベンチマークと測定を目的として、IEEE/CVF European Conference on Computer Vision (ECCV) 2024 に合わせて半日ワークショップとして第 2 回知覚テストチャレンジを開催しました。
Perception Test ベンチマークを使用した昨年からの進捗状況。
今年のチャレンジには 7 つのトラック (昨年の 6 つから増加) があり、ビデオ、オーディオ、テキストモダリティにわたる、言語および非言語インターフェイスを使用した低レベルおよび高レベルのタスクがカバーされました。
追加トラックでは 1 時間のビデオの理解をカバーし、新しいビデオ QA ベンチマーク 1 時間ウォーク VQA を導入しました。
全体として、さまざまなトラックのタスクは、オブジェクト追跡、ポイント追跡、時間的動作位置特定、時間的音位置特定、多肢選択式ビデオ質問応答、接地ビデオ質問応答、および 1 時間のビデオ質問応答でした。
このレポートでは、課題と結果を要約し、新しい 1 時間のビデオ QA ベンチマークである 1h-walk VQA について詳しく紹介します。

要約(オリジナル)

Following the successful 2023 edition, we organised the Second Perception Test challenge as a half-day workshop alongside the IEEE/CVF European Conference on Computer Vision (ECCV) 2024, with the goal of benchmarking state-of-the-art video models and measuring the progress since last year using the Perception Test benchmark. This year, the challenge had seven tracks (up from six last year) and covered low-level and high-level tasks, with language and non-language interfaces, across video, audio, and text modalities; the additional track covered hour-long video understanding and introduced a novel video QA benchmark 1h-walk VQA. Overall, the tasks in the different tracks were: object tracking, point tracking, temporal action localisation, temporal sound localisation, multiple-choice video question-answering, grounded video question-answering, and hour-long video question-answering. We summarise in this report the challenge tasks and results, and introduce in detail the novel hour-long video QA benchmark 1h-walk VQA.

arxiv情報

著者	Joseph Heyward,João Carreira,Dima Damen,Andrew Zisserman,Viorica Pătrăucean
発行日	2024-11-29 18:57:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー