Generic-to-Specific Distillation of Masked Autoencoders

要約

自己管理型の事前トレーニングメカニズムによって駆動される大型ビジョントランスフォーマー (ViT) は、前例のない進歩を遂げました。
ただし、モデルの容量によって制限される軽量の ViT モデルは、これらの事前トレーニングメカニズムの恩恵をほとんど受けません。
知識の蒸留は、大きな (教師) モデルから小さな (学生) モデルに表現を移すためのパラダイムを定義します。
ただし、従来の単一段階の蒸留は、モデルの一般化に不可欠なタスクに依存しない知識を保持できず、タスク固有の転送に簡単に行き詰まります。
この研究では、ジェネリックから特定への蒸留 (G2SD) を提案し、マスクされたオートエンコーダーによって事前にトレーニングされた大規模なモデルの監督下で、小規模な ViT モデルの可能性を引き出します。
一般的な蒸留では、小さなモデルのデコーダーは、機能予測を大きなモデルの隠れた表現と一致させることをお勧めします。これにより、タスクに依存しない知識を転送できます。
特定の蒸留では、小さなモデルの予測は大きなモデルの予測と一致するように制約され、タスクのパフォーマンスを保証するタスク固有の機能を転送します。
G2SD を使用すると、通常の ViT-Small モデルは、画像分類、オブジェクト検出、およびセマンティックセグメンテーションについて、教師 (ViT-Base) のそれぞれ 98.7%、98.1%、および 99.3% のパフォーマンスを達成し、2 段階のビジョン蒸留の堅実なベースラインを設定します。
.
コードは https://github.com/pengzhiiang/G2SD で入手できます。

要約(オリジナル)

Large vision Transformers (ViTs) driven by self-supervised pre-training mechanisms achieved unprecedented progress. Lightweight ViT models limited by the model capacity, however, benefit little from those pre-training mechanisms. Knowledge distillation defines a paradigm to transfer representations from large (teacher) models to small (student) ones. However, the conventional single-stage distillation easily gets stuck on task-specific transfer, failing to retain the task-agnostic knowledge crucial for model generalization. In this study, we propose generic-to-specific distillation (G2SD), to tap the potential of small ViT models under the supervision of large models pre-trained by masked autoencoders. In generic distillation, decoder of the small model is encouraged to align feature predictions with hidden representations of the large model, so that task-agnostic knowledge can be transferred. In specific distillation, predictions of the small model are constrained to be consistent with those of the large model, to transfer task-specific features which guarantee task performance. With G2SD, the vanilla ViT-Small model respectively achieves 98.7%, 98.1% and 99.3% the performance of its teacher (ViT-Base) for image classification, object detection, and semantic segmentation, setting a solid baseline for two-stage vision distillation. Code will be available at https://github.com/pengzhiliang/G2SD.

arxiv情報

著者	Wei Huang,Zhiliang Peng,Li Dong,Furu Wei,Jianbin Jiao,Qixiang Ye
発行日	2023-02-28 17:13:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Generic-to-Specific Distillation of Masked Autoencoders

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー