Siamese Masked Autoencoders

要約

画像またはシーン間の対応関係を確立することは、特にオクルージョン、視点の変更、およびオブジェクトの外観の変化を考慮すると、コンピュータービジョンにおける重要な課題です。
この論文では、ビデオから視覚的な対応を学習するためのマスクドオートエンコーダー (MAE) の単純な拡張である Siamese Masked Autoencoders (SiamMAE) を紹介します。
SiamMAE は、ランダムにサンプリングされたビデオフレームのペアを処理し、それらを非対称にマスクします。
これらのフレームはエンコーダネットワークによって独立して処理され、一連のクロスアテンション層で構成されるデコーダは、将来のフレームで欠落しているパッチを予測する役割を果たします。
過去のフレームを変更せずに、将来のフレームのパッチの大部分 ($95\%$) をマスクすることで、SiamMAE はネットワークがオブジェクトの動きに焦点を当て、オブジェクト中心の表現を学習することを促します。
概念的な単純さにもかかわらず、SiamMAE によって学習された機能は、ビデオオブジェクトのセグメンテーション、ポーズキーポイントの伝播、およびセマンティックパーツの伝播タスクに関して、最先端の自己教師あり手法よりも優れたパフォーマンスを発揮します。
SiamMAE は、データ拡張、手作りの追跡ベースの口実タスク、または表現の崩壊を防ぐその他の技術に依存せずに、競争力のある結果を達成しています。

要約(オリジナル)

Establishing correspondence between images or scenes is a significant challenge in computer vision, especially given occlusions, viewpoint changes, and varying object appearances. In this paper, we present Siamese Masked Autoencoders (SiamMAE), a simple extension of Masked Autoencoders (MAE) for learning visual correspondence from videos. SiamMAE operates on pairs of randomly sampled video frames and asymmetrically masks them. These frames are processed independently by an encoder network, and a decoder composed of a sequence of cross-attention layers is tasked with predicting the missing patches in the future frame. By masking a large fraction ($95\%$) of patches in the future frame while leaving the past frame unchanged, SiamMAE encourages the network to focus on object motion and learn object-centric representations. Despite its conceptual simplicity, features learned via SiamMAE outperform state-of-the-art self-supervised methods on video object segmentation, pose keypoint propagation, and semantic part propagation tasks. SiamMAE achieves competitive results without relying on data augmentation, handcrafted tracking-based pretext tasks, or other techniques to prevent representational collapse.

arxiv情報

著者	Agrim Gupta,Jiajun Wu,Jia Deng,Li Fei-Fei
発行日	2023-05-23 17:59:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Siamese Masked Autoencoders

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー