DCD: Discriminative and Consistent Representation Distillation

要約

知識蒸留 (KD) は、大規模な教師モデルから小規模な生徒モデルに知識を伝達することを目的としています。
対照学習は、識別表現を作成することで自己教師あり学習において有望であることが示されていますが、知識の蒸留への応用は依然として限られており、主に識別に焦点を当てており、教師モデルによって捉えられる構造的関係は無視されています。
この制限に対処するために、教師と生徒の表現の分布間の不一致を最小限に抑えるために、一貫性の正則化とともに対照的な損失を使用する識別一貫性蒸留 (DCD) を提案します。
私たちの方法では、学習可能な温度とバイアスパラメーターを導入し、これらの相補的な目的のバランスをとるためにトレーニング中に適応し、対照的な学習アプローチで一般的に使用される固定ハイパーパラメーターを置き換えます。
CIFAR-100 と ImageNet ILSVRC-2012 での広範な実験を通じて、DCD が最先端のパフォーマンスを達成し、生徒モデルが教師の精度を上回る場合があることを実証しました。
さらに、DCD の学習された表現が Tiny ImageNet および STL-10 に転送された場合に優れたデータセット間一般化を示すことを示します。
コードは https://github.com/giakoumoglou/distillers で入手できます。

要約(オリジナル)

Knowledge Distillation (KD) aims to transfer knowledge from a large teacher model to a smaller student model. While contrastive learning has shown promise in self-supervised learning by creating discriminative representations, its application in knowledge distillation remains limited and focuses primarily on discrimination, neglecting the structural relationships captured by the teacher model. To address this limitation, we propose Discriminative and Consistent Distillation (DCD), which employs a contrastive loss along with a consistency regularization to minimize the discrepancy between the distributions of teacher and student representations. Our method introduces learnable temperature and bias parameters that adapt during training to balance these complementary objectives, replacing the fixed hyperparameters commonly used in contrastive learning approaches. Through extensive experiments on CIFAR-100 and ImageNet ILSVRC-2012, we demonstrate that DCD achieves state-of-the-art performance, with the student model sometimes surpassing the teacher’s accuracy. Furthermore, we show that DCD’s learned representations exhibit superior cross-dataset generalization when transferred to Tiny ImageNet and STL-10. Code is available at https://github.com/giakoumoglou/distillers.

arxiv情報

著者	Nikolaos Giakoumoglou,Tania Stathaki
発行日	2024-11-15 14:54:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DCD: Discriminative and Consistent Representation Distillation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー