Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers

要約

フューショット知識蒸留は、限られたデータと計算リソースを使用して、大規模な事前トレーニング済みモデルの知識を活用するための実行可能なアプローチとして最近登場しました。
この論文では、ビジョントランスフォーマー向けの新しい少数ショット特徴抽出アプローチを提案します。
私たちのアプローチは 2 つの重要なステップに基づいています。
ビジョントランスフォーマーが一貫した深さ方向の構造を持っているという事実を利用して、まず、既存の事前トレーニング済みビジョントランスフォーマー (教師) の断続的なレイヤーからより浅いアーキテクチャ (生徒) に重みをコピーします。そこでは、断続係数がスチューデントトランスフォーマーの複雑さを制御します。
その先生に関して。
次に、低ランク適応 (LoRA) の強化バージョンを使用して、スキップされた教師層によって実行された情報処理を回復することを目的として、数ショットのシナリオで知識を生徒に浸透させます。
さまざまな領域 (自然画像、医療画像、衛星画像) とタスク (分類とセグメンテーション) からの 6 つのデータセットについて、教師付きおよび自己教師付き変換器を教師として使用した包括的な実験を紹介します。
実証結果は、最先端の競合他社に対する当社のアプローチの優位性を裏付けています。
さらに、アブレーションの結果は、提案されたパイプラインの各コンポーネントの有用性を示しています。
コードは https://github.com/dianagrigore/WeCoLoRA でリリースされています。

要約(オリジナル)

Few-shot knowledge distillation recently emerged as a viable approach to harness the knowledge of large-scale pre-trained models, using limited data and computational resources. In this paper, we propose a novel few-shot feature distillation approach for vision transformers. Our approach is based on two key steps. Leveraging the fact that vision transformers have a consistent depth-wise structure, we first copy the weights from intermittent layers of existing pre-trained vision transformers (teachers) into shallower architectures (students), where the intermittence factor controls the complexity of the student transformer with respect to its teacher. Next, we employ an enhanced version of Low-Rank Adaptation (LoRA) to distill knowledge into the student in a few-shot scenario, aiming to recover the information processing carried out by the skipped teacher layers. We present comprehensive experiments with supervised and self-supervised transformers as teachers, on six data sets from various domains (natural, medical and satellite images) and tasks (classification and segmentation). The empirical results confirm the superiority of our approach over state-of-the-art competitors. Moreover, the ablation results demonstrate the usefulness of each component of the proposed pipeline. We release our code at https://github.com/dianagrigore/WeCoLoRA.

arxiv情報

著者	Diana-Nicoleta Grigore,Mariana-Iuliana Georgescu,Jon Alvarez Justo,Tor Johansen,Andreea Iuliana Ionescu,Radu Tudor Ionescu
発行日	2024-10-30 16:27:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー