Semi-supervised Vision Transformers at Scale

要約

我々は、ビジョン変換器(ViT)のための半教師付き学習(SSL)を研究する。ViTアーキテクチャは様々なタスクに広く採用されているにもかかわらず、このトピックはあまり研究されていない。この問題に取り組むため、我々は新しいSSLパイプラインを提案する。このパイプラインは、まず非教師付き／自己教師付き事前学習、次に教師付き微調整、最後に半教師付き微調整から構成される。半教師付き微調整の段階では、一般的なFixMatchの代わりに指数移動平均(EMA)-Teacherの枠組みを採用する。EMAはより安定で、半教師付きビジョン変換においてより高い精度を実現するためである。さらに、弱い帰納的バイアスを持つViTの訓練に重要な正則化を改善するために、ラベルのないサンプルとその疑似ラベルを補間する確率的疑似混合機構を提案する。提案手法はSemi-ViTと名付けられ、半教師付き分類においてCNNと同等以上の性能を達成することができる。また、Semi-ViTはViTのスケーラビリティの利点を生かし、精度を向上させた大規模なモデルに容易にスケールアップすることが可能である。例えば、Semi-ViT-Hugeは、ImageNetにおいてわずか1%のラベルで80%のトップ1精度を達成しており、これは100%のImageNetラベルを用いたInception-v4と同程度です。

要約(オリジナル)

We study semi-supervised learning (SSL) for vision transformers (ViT), an under-explored topic despite the wide adoption of the ViT architectures to different tasks. To tackle this problem, we propose a new SSL pipeline, consisting of first un/self-supervised pre-training, followed by supervised fine-tuning, and finally semi-supervised fine-tuning. At the semi-supervised fine-tuning stage, we adopt an exponential moving average (EMA)-Teacher framework instead of the popular FixMatch, since the former is more stable and delivers higher accuracy for semi-supervised vision transformers. In addition, we propose a probabilistic pseudo mixup mechanism to interpolate unlabeled samples and their pseudo labels for improved regularization, which is important for training ViTs with weak inductive bias. Our proposed method, dubbed Semi-ViT, achieves comparable or better performance than the CNN counterparts in the semi-supervised classification setting. Semi-ViT also enjoys the scalability benefits of ViTs that can be readily scaled up to large-size models with increasing accuracies. For example, Semi-ViT-Huge achieves an impressive 80% top-1 accuracy on ImageNet using only 1% labels, which is comparable with Inception-v4 using 100% ImageNet labels.

arxiv情報

著者	Zhaowei Cai,Avinash Ravichandran,Paolo Favaro,Manchen Wang,Davide Modolo,Rahul Bhotika,Zhuowen Tu,Stefano Soatto
発行日	2022-08-11 08:11:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Semi-supervised Vision Transformers at Scale

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー