KeyVideoLLM: Towards Large-scale Video Keyframe Selection

要約

近年、Web動画の増加に伴い、大規模な動画データセットの管理と理解がますます重要になってきている。Video Large Language Models (VideoLLM)は、その強力な動画理解能力により、近年台頭してきた。しかし、VideoLLMの学習・推論処理には膨大な量のデータが必要であり、データ管理、特に効率性、頑健性、有効性に大きな課題がある。本研究では、VideoLLMのデータを効率的、頑健、かつ効果的に管理するために設計された、テキストとビデオフレームの類似性に基づくキーフレーム選択手法であるKeyVideoLLMを紹介する。具体的には、KeyVideoLLMは最大60.9倍という驚異的なデータ圧縮率を達成し、ディスク容量を大幅に削減する。さらに、すべてのビデオフォーマットとスケールで100%の選択成功率を維持し、既存のキーフレーム選択手法と比較して処理速度を最大200倍向上させ、ハイパーパラメータのチューニングを必要としません。KeyVideoLLMは、その卓越した効率性と頑健性だけでなく、学習と推論の両段階において、ビデオ質問応答タスクにおけるモデル性能をさらに向上させる。注目すべきは、多様なデータセットにおいて、常に最先端の実験結果（SoTA）を達成していることである。

要約(オリジナル)

Recently, with the rise of web videos, managing and understanding large-scale video datasets has become increasingly important. Video Large Language Models (VideoLLMs) have emerged in recent years due to their strong video understanding capabilities. However, training and inference processes for VideoLLMs demand vast amounts of data, presenting significant challenges to data management, particularly regarding efficiency, robustness, and effectiveness. In this work, we present KeyVideoLLM, a text-video frame similarity-based keyframe selection method designed to manage VideoLLM data efficiently, robustly, and effectively. Specifically, KeyVideoLLM achieves a remarkable data compression rate of up to 60.9 times, substantially lowering disk space requirements, which proves its high efficiency. Additionally, it maintains a 100% selection success rate across all video formats and scales, enhances processing speed by up to 200 times compared to existing keyframe selection methods, and does not require hyperparameter tuning. Beyond its outstanding efficiency and robustness, KeyVideoLLM further improves model performance in video question-answering tasks during both training and inference stages. Notably, it consistently achieved the state-of-the-art (SoTA) experimental results on diverse datasets.

arxiv情報

著者	Hao Liang,Jiapeng Li,Tianyi Bai,Chong Chen,Conghui He,Bin Cui,Wentao Zhang
発行日	2024-07-03 13:41:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

KeyVideoLLM: Towards Large-scale Video Keyframe Selection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー