SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation

要約

人間中心のビデオフレーム補間は、人々のエンターテイメントエクスペリエンスを向上させ、スローモーションビデオの合成など、スポーツ分析業界での商用アプリケーションを見つける大きな可能性を秘めています。
コミュニティでは複数のベンチマークデータセットが利用可能ですが、人間中心のシナリオ専用のものはありません。
このギャップを埋めるために、YouTube からクロールされた 13 万を超えるビデオクリップと 100 万ビデオフレームの高解像度 ($\geq$720p) のスローモーションスポーツビデオで構成されるベンチマークである SportsSloMo を紹介します。
私たちはベンチマークでいくつかの最先端の手法を再トレーニングしましたが、その結果、他のデータセットと比較して精度が低下していることがわかりました。
これは、私たちのベンチマークの難しさを強調し、人体は非常に変形しやすく、スポーツビデオではオクルージョンが頻繁に発生するため、最高のパフォーマンスを発揮する方法であっても重大な課題を引き起こすことを示唆しています。
精度を向上させるために、人間が認識する事前分布を考慮した 2 つの損失項を導入し、それぞれパノプティックセグメンテーションと人間のキーポイント検出に補助的な監視を追加します。
損失項はモデルに依存せず、あらゆるビデオフレーム補間アプローチに簡単に組み込むことができます。
実験結果は、提案した損失条件の有効性を検証し、既存の 5 つのモデルに比べて一貫したパフォーマンスの向上をもたらし、ベンチマークにおける強力なベースラインモデルを確立しました。
データセットとコードは https://neu-vi.github.io/SportsSlomo/ にあります。

要約(オリジナル)

Human-centric video frame interpolation has great potential for improving people’s entertainment experiences and finding commercial applications in the sports analysis industry, e.g., synthesizing slow-motion videos. Although there are multiple benchmark datasets available in the community, none of them is dedicated for human-centric scenarios. To bridge this gap, we introduce SportsSloMo, a benchmark consisting of more than 130K video clips and 1M video frames of high-resolution ($\geq$720p) slow-motion sports videos crawled from YouTube. We re-train several state-of-the-art methods on our benchmark, and the results show a decrease in their accuracy compared to other datasets. It highlights the difficulty of our benchmark and suggests that it poses significant challenges even for the best-performing methods, as human bodies are highly deformable and occlusions are frequent in sports videos. To improve the accuracy, we introduce two loss terms considering the human-aware priors, where we add auxiliary supervision to panoptic segmentation and human keypoints detection, respectively. The loss terms are model agnostic and can be easily plugged into any video frame interpolation approaches. Experimental results validate the effectiveness of our proposed loss terms, leading to consistent performance improvement over 5 existing models, which establish strong baseline models on our benchmark. The dataset and code can be found at: https://neu-vi.github.io/SportsSlomo/.

arxiv情報

著者	Jiaben Chen,Huaizu Jiang
発行日	2023-12-12 18:59:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー