Attention Distillation: self-supervised vision transformer students need more guidance

要約

自己教師付き学習は、高品質の視覚変換器を訓練するために広く適用されている。そのため、メモリや計算量に制約のあるデバイスでその優れた性能を発揮させることは重要な研究課題である。しかし、ある自己教師付きViTから別のViTへ知識を抽出する方法はまだ検討されていない。また、ConvNetを用いた既存の自己監視型知識蒸留法は、ViTの知識蒸留に最適とは言い難い。本論文では、自己教師付き視覚変換器の知識蒸留（ViT-SSKD）を研究する。重要な注意メカニズムからの情報を教師から生徒へ直接蒸留することで、両者の性能差を大幅に縮めることができることを示す。ImageNet-SubsetとImageNet-1Kを用いた実験では、我々の手法AttnDistillが既存の自己教師あり知識抽出（SSKD）手法を凌駕し、ゼロから学習する自己教師あり学習（SSL）手法と比較して、最先端のk-NNの精度を達成することを示している（ViT-Sモデルを用いた場合）。また、極小のViT-Tモデルを自己教師あり学習に適用したのは、我々が初めてである。さらに、AttnDistillは自己教師付き学習アルゴリズムに依存しないため、今後の研究においてViTベースのSSL手法に適用し、性能を向上させることが可能である。コードはこちら: https://github.com/wangkai930418/attndistill

要約(オリジナル)

Self-supervised learning has been widely applied to train high-quality vision transformers. Unleashing their excellent performance on memory and compute constraint devices is therefore an important research topic. However, how to distill knowledge from one self-supervised ViT to another has not yet been explored. Moreover, the existing self-supervised knowledge distillation (SSKD) methods focus on ConvNet based architectures are suboptimal for ViT knowledge distillation. In this paper, we study knowledge distillation of self-supervised vision transformers (ViT-SSKD). We show that directly distilling information from the crucial attention mechanism from teacher to student can significantly narrow the performance gap between both. In experiments on ImageNet-Subset and ImageNet-1K, we show that our method AttnDistill outperforms existing self-supervised knowledge distillation (SSKD) methods and achieves state-of-the-art k-NN accuracy compared with self-supervised learning (SSL) methods learning from scratch (with the ViT-S model). We are also the first to apply the tiny ViT-T model on self-supervised learning. Moreover, AttnDistill is independent of self-supervised learning algorithms, it can be adapted to ViT based SSL methods to improve the performance in future research. The code is here: https://github.com/wangkai930418/attndistill

arxiv情報

著者	Kai Wang,Fei Yang,Joost van de Weijer
発行日	2022-10-03 14:01:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Attention Distillation: self-supervised vision transformer students need more guidance

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー