TVT: Training-Free Vision Transformer Search on Tiny Datasets

要約

トレーニング不要のビジョントランスフォーマー (ViT) アーキテクチャ検索は、ゼロコストプロキシを使用してより良い ViT を検索するために提供されます。
ViT は小規模なデータセット上の CNN 教師モデルから大幅な蒸留ゲインを達成しますが、実験的観察によれば、ViT の現在のゼロコストプロキシは蒸留トレーニングパラダイムにうまく一般化できません。
この論文では、教師モデルの助けを借りてトレーニング不要の方法で検索する方法を初めて調査し、効果的なトレーニング不要の ViT (TVT) 検索フレームワークを考案します。
まず、ViT 教師と ConvNet 教師の間のアテンションマップの類似性が蒸留精度に顕著に影響を与えることがわかりました。
したがって、教師と生徒の間の特徴の注意関係に基づいて条件付けされた教師を意識した指標を提示します。
さらに、TVT は、ランキングの一貫性を向上させるために、生徒の重みの L2 ノルムを生徒の能力指標として採用しています。
最後に、TVT は、教師を意識した指標と生徒の能力の指標を使用して、ConvNet 教師との蒸留に最適な ViT を検索し、その結果、効率と有効性が大幅に向上しました。
さまざまな小さなデータセットと検索空間に関する広範な実験により、当社の TVT が最先端のトレーニング不要の検索方法よりも優れていることが示されました。
コードが公開されます。

要約(オリジナル)

Training-free Vision Transformer (ViT) architecture search is presented to search for a better ViT with zero-cost proxies. While ViTs achieve significant distillation gains from CNN teacher models on small datasets, the current zero-cost proxies in ViTs do not generalize well to the distillation training paradigm according to our experimental observations. In this paper, for the first time, we investigate how to search in a training-free manner with the help of teacher models and devise an effective Training-free ViT (TVT) search framework. Firstly, we observe that the similarity of attention maps between ViT and ConvNet teachers affects distill accuracy notably. Thus, we present a teacher-aware metric conditioned on the feature attention relations between teacher and student. Additionally, TVT employs the L2-Norm of the student’s weights as the student-capability metric to improve ranking consistency. Finally, TVT searches for the best ViT for distilling with ConvNet teachers via our teacher-aware metric and student-capability metric, resulting in impressive gains in efficiency and effectiveness. Extensive experiments on various tiny datasets and search spaces show that our TVT outperforms state-of-the-art training-free search methods. The code will be released.

arxiv情報

著者	Zimian Wei,Hengyue Pan,Lujun Li,Peijie Dong,Zhiliang Tian,Xin Niu,Dongsheng Li
発行日	2023-11-24 08:24:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

TVT: Training-Free Vision Transformer Search on Tiny Datasets

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー