Data Collection-free Masked Video Modeling

要約

ビデオトランスフォーマーの事前トレーニングには一般に大量のデータが必要であり、データ収集コストとプライバシー、ライセンス、固有のバイアスに関する懸念の点で大きな課題が生じます。
データの合成はこれらの問題を解決する有望な方法の 1 つですが、合成データのみを使用した事前トレーニングには独自の課題があります。
このペーパーでは、すぐに利用でき、コストもかからない静止画像を活用した、ビデオ用の効果的な自己教師あり学習フレームワークを紹介します。
具体的には、画像変換を再帰的に適用して画像から擬似モーションビデオを生成する擬似モーションジェネレーター (PMG) モジュールを定義します。
これらの疑似モーションビデオは、マスクされたビデオモデリングに利用されます。
私たちのアプローチは合成画像にも適用できるため、ビデオの事前トレーニングをデータ収集コストや実際のデータのその他の懸念から完全に解放します。
動作認識タスクの実験を通じて、このフレームワークが疑似モーションビデオを介して時空間特徴を効果的に学習できることを実証し、静止画像も使用する既存の方法よりも大幅に改善し、実際のビデオと合成ビデオの両方を使用する方法よりも部分的に優れています。
これらの結果は、ビデオトランスフォーマーがマスクされたビデオモデリングを通じて学習した内容の断片を明らかにします。

要約(オリジナル)

Pre-training video transformers generally requires a large amount of data, presenting significant challenges in terms of data collection costs and concerns related to privacy, licensing, and inherent biases. Synthesizing data is one of the promising ways to solve these issues, yet pre-training solely on synthetic data has its own challenges. In this paper, we introduce an effective self-supervised learning framework for videos that leverages readily available and less costly static images. Specifically, we define the Pseudo Motion Generator (PMG) module that recursively applies image transformations to generate pseudo-motion videos from images. These pseudo-motion videos are then leveraged in masked video modeling. Our approach is applicable to synthetic images as well, thus entirely freeing video pre-training from data collection costs and other concerns in real data. Through experiments in action recognition tasks, we demonstrate that this framework allows effective learning of spatio-temporal features through pseudo-motion videos, significantly improving over existing methods which also use static images and partially outperforming those using both real and synthetic videos. These results uncover fragments of what video transformers learn through masked video modeling.

arxiv情報

著者	Yuchi Ishikawa,Masayoshi Kondo,Yoshimitsu Aoki
発行日	2024-09-10 17:34:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Data Collection-free Masked Video Modeling

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー