Self-Supervised Learning for Videos: A Survey

要約

さまざまな分野での深層学習の目覚ましい成功は、大規模な注釈付きデータセットの利用可能性にかかっています。
ただし、注釈の取得には費用がかかり、多大な労力が必要となるため、ビデオの場合は特に困難です。
さらに、人間が生成したアノテーションを使用すると、学習が偏り、領域の一般化と堅牢性が低下したモデルが生成されます。
代替として、自己教師あり学習は、注釈を必要としない表現学習の方法を提供し、画像とビデオの両方のドメインで有望であることが示されています。
画像領域とは異なり、ビデオ表現の学習は、動きやその他の環境ダイナミクスを取り入れる時間的次元のため、より困難です。
これは、ビデオおよびマルチモーダル領域での自己教師あり学習を推進するビデオ独自のアイデアの機会も提供します。
この調査では、ビデオ領域に焦点を当てた自己教師あり学習に関する既存のアプローチをレビューします。
これらの方法を、学習目的に基づいて 4 つの異なるカテゴリにまとめます: 1) 口実タスク、2) 生成学習、3) 対比学習、4) クロスモーダル合意。
さらに、一般的に使用されるデータセット、下流の評価タスク、既存の研究の限界についての洞察、およびこの分野の潜在的な将来の方向性についても紹介します。

要約(オリジナル)

The remarkable success of deep learning in various domains relies on the availability of large-scale annotated datasets. However, obtaining annotations is expensive and requires great effort, which is especially challenging for videos. Moreover, the use of human-generated annotations leads to models with biased learning and poor domain generalization and robustness. As an alternative, self-supervised learning provides a way for representation learning which does not require annotations and has shown promise in both image and video domains. Different from the image domain, learning video representations are more challenging due to the temporal dimension, bringing in motion and other environmental dynamics. This also provides opportunities for video-exclusive ideas that advance self-supervised learning in the video and multimodal domain. In this survey, we provide a review of existing approaches on self-supervised learning focusing on the video domain. We summarize these methods into four different categories based on their learning objectives: 1) pretext tasks, 2) generative learning, 3) contrastive learning, and 4) cross-modal agreement. We further introduce the commonly used datasets, downstream evaluation tasks, insights into the limitations of existing works, and the potential future directions in this area.

arxiv情報

著者	Madeline C. Schiappa,Yogesh S. Rawat,Mubarak Shah
発行日	2023-07-19 16:00:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Self-Supervised Learning for Videos: A Survey

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー