Understanding Masked Autoencoders From a Local Contrastive Perspective

要約

Masked AutoEncoder(MAE)は、そのシンプルかつ効果的なマスキングと再構成戦略により、自己教師付き学習の分野に革命をもたらした。しかし、様々な下流視覚タスクにおいて最先端の性能を達成しているにもかかわらず、MAEの有効性を促進する根本的なメカニズムは、一般的なコントラスト学習パラダイムと比較してあまり研究されていない。本論文では、「MAE内部の豊かな隠れた表現」に真に寄与するものを説明するための新たな視点を探求する。まず、積極的なマスキングから画像を再構成するためのユニークなエンコーダ・デコーダのアーキテクチャを持つMAEの生成的事前学習経路に関して、デコーダの動作の詳細な分析を行う。その結果、MAEのデコーダは、よく知られた「局所性原理」に従い、限られた受容野を持つ局所的な特徴を主に学習することがわかった。この局所性の仮定に基づき、我々は、再構成に基づくMAEを、理解を向上させるための局所領域レベルの対照学習形式に再定式化する理論的枠組みを提案する。さらに、MAEの局所的な対比的性質を実証するために、マスキングや明示的なデコーダを用いずに、MAEと対比的学習のエッセンスを組み合わせたシャムアーキテクチャを導入し、統一的でより柔軟な自己教師あり学習のフレームワークに光を当てる。

要約(オリジナル)

Masked AutoEncoder(MAE) has revolutionized the field of self-supervised learning with its simple yet effective masking and reconstruction strategies. However, despite achieving state-of-the-art performance across various downstream vision tasks, the underlying mechanisms that drive MAE’s efficacy are less well-explored compared to the canonical contrastive learning paradigm. In this paper, we explore a new perspective to explain what truly contributes to the ‘rich hidden representations inside the MAE’. Firstly, concerning MAE’s generative pretraining pathway, with a unique encoder-decoder architecture to reconstruct images from aggressive masking, we conduct an in-depth analysis of the decoder’s behaviors. We empirically find that MAE’s decoder mainly learns local features with a limited receptive field, adhering to the well-known Locality Principle. Building upon this locality assumption, we propose a theoretical framework that reformulates the reconstruction-based MAE into a local region-level contrastive learning form for improved understanding. Furthermore, to substantiate the local contrastive nature of MAE, we introduce a Siamese architecture that combines the essence of MAE and contrastive learning without masking and explicit decoder, which sheds light on a unified and more flexible self-supervised learning framework.

arxiv情報

著者	Xiaoyu Yue,Lei Bai,Meng Wei,Jiangmiao Pang,Xihui Liu,Luping Zhou,Wanli Ouyang
発行日	2023-10-03 12:08:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Understanding Masked Autoencoders From a Local Contrastive Perspective

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー