Contrastive Losses Are Natural Criteria for Unsupervised Video Summarization

要約

ビデオの要約は、効率的なビデオブラウジングを容易にするために、ビデオ内のフレームの最も有益なサブセットを選択することを目的としています。
教師なしメソッドは、通常、多様性や代表性などのヒューリスティックトレーニングの目的に依存します。
ただし、このような方法では、オンラインで生成された要約をブートストラップして、重要度スコア回帰の目的を計算する必要があります。
このようなパイプラインは非効率的であると考えており、表現学習文献の対照的な損失を利用して、フレームレベルの重要性を直接定量化しようとしています。
対照的な損失を活用して、望ましいキーフレームを特徴とする 3 つのメトリックを提案します: ローカルの非類似性、グローバルな一貫性、および一意性。
画像分類タスクで事前にトレーニングされた機能を使用すると、メトリクスはすでに高品質の重要度スコアを生成でき、過去の十分にトレーニングされた方法よりも競争力のある、または優れたパフォーマンスを示しています。
軽量の対比的に学習された投影モジュールを使用して事前トレーニング済みの機能を改良することにより、フレームレベルの重要度スコアをさらに改善できることを示します。また、モデルは多数のランダムなビデオを活用し、ビデオを一般化して適切なパフォーマンスでテストすることもできます。
コードは https://github.com/pangzss/pytorch-CTVSUM で入手できます。

要約(オリジナル)

Video summarization aims to select the most informative subset of frames in a video to facilitate efficient video browsing. Unsupervised methods usually rely on heuristic training objectives such as diversity and representativeness. However, such methods need to bootstrap the online-generated summaries to compute the objectives for importance score regression. We consider such a pipeline inefficient and seek to directly quantify the frame-level importance with the help of contrastive losses in the representation learning literature. Leveraging the contrastive losses, we propose three metrics featuring a desirable key frame: local dissimilarity, global consistency, and uniqueness. With features pre-trained on the image classification task, the metrics can already yield high-quality importance scores, demonstrating competitive or better performance than past heavily-trained methods. We show that by refining the pre-trained features with a lightweight contrastively learned projection module, the frame-level importance scores can be further improved, and the model can also leverage a large number of random videos and generalize to test videos with decent performance. Code available at https://github.com/pangzss/pytorch-CTVSUM.

arxiv情報

著者	Zongshang Pang,Yuta Nakashima,Mayu Otani,Hajime Nagahara
発行日	2022-11-18 07:01:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Contrastive Losses Are Natural Criteria for Unsupervised Video Summarization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー