Building Scalable Video Understanding Benchmarks through Sports

要約

長いビデオの理解を評価するための既存のベンチマークは、注釈の規模または品質に欠けているなど、複数の側面で不十分です。
これらの制限は、多くの場合、毎秒多くのフレームを手動でラベル付けすることによって取得される長いビデオ (アクション、ダイアログなど) の高密度の注釈を収集する際の困難から生じます。
この作業では、自動化された注釈とビデオストリームアライメントパイプライン (略して ASAP) を紹介します。
4 つの異なるスポーツ (クリケット、フットボール、バスケットボール、アメリカンフットボール) のラベルのないビデオを、Web 上で自由に利用できる対応する詳細な注釈 (解説など) と並べることによって、ASAP の一般性を示します。
私たちの人体研究は、ASAP が動画と注釈を高い忠実度、精度、速度で整列できることを示しています。
次に、ASAP のスケーラビリティを活用して、大規模な長いビデオ理解ベンチマークである LCric を作成します。これは、実質的にゼロのアノテーションコストで収集された 1000 時間以上の高密度に注釈が付けられた長い Cricket ビデオ (平均サンプル長は 50 分) です。
私たちは、LCric で最先端のビデオ理解モデルをベンチマークし、分析します。これには、多数の合成マルチチョイスおよび回帰クエリが使用されます。
私たちは、新しい研究が探求される大きな余地があることを示す人間のベースラインを確立します。

要約(オリジナル)

Existing benchmarks for evaluating long video understanding falls short on multiple aspects, either lacking in scale or quality of annotations. These limitations arise from the difficulty in collecting dense annotations for long videos (e.g. actions, dialogues, etc.), which are often obtained by manually labeling many frames per second. In this work, we introduce an automated Annotation and Video Stream Alignment Pipeline (abbreviated ASAP). We demonstrate the generality of ASAP by aligning unlabeled videos of four different sports (Cricket, Football, Basketball, and American Football) with their corresponding dense annotations (i.e. commentary) freely available on the web. Our human studies indicate that ASAP can align videos and annotations with high fidelity, precision, and speed. We then leverage ASAP scalability to create LCric, a large-scale long video understanding benchmark, with over 1000 hours of densely annotated long Cricket videos (with an average sample length of 50 mins) collected at virtually zero annotation cost. We benchmark and analyze state-of-the-art video understanding models on LCric through a large set of compositional multi-choice and regression queries. We establish a human baseline that indicates significant room for new research to explore.

arxiv情報

著者	Aniket Agarwal,Alex Zhang,Karthik Narasimhan,Igor Gilitschenski,Vishvak Murahari,Yash Kant
発行日	2023-01-17 13:20:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Building Scalable Video Understanding Benchmarks through Sports

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー