FerKD: Surgical Label Adaptation for Efficient Distillation

要約

我々は、部分的なソフトラベルとハードラベルの適応を領域キャリブレーションメカニズムと組み合わせた、新しい効率的な知識蒸留フレームワークであるFerKDを紹介します。
私たちのアプローチは、RandomResizeCrop などの標準的なデータ拡張は、入力をイージーポジティブ、ハードポジティブ、ハードネガティブなどのさまざまな条件に変換する傾向があるという観察と直観に基づいています。
従来の蒸留フレームワークでは、これらの変換されたサンプルは、事前トレーニングされた教師モデルから導出された予測確率を通じて均等に利用されます。
ただし、先行研究で一般的に行われていた、事前トレーニングを受けた教師からの予測値に単に依存するだけでは、これらのソフトラベル予測の信頼性が無視されます。
これに対処するために、ソフト化されたハードグラウンドトゥルースラベルを使用して、信頼性の低い領域をコンテキストとなるように調整する新しいスキームを提案します。
私たちのアプローチには、ハード領域のマイニング + キャリブレーションのプロセスが含まれます。
我々は、この方法により収束速度と最終精度が大幅に向上することを経験的に示しています。
さらに、一貫した混合戦略により、ソフトラベルを活用してソフト監視の分布を安定化できることがわかりました。
その結果、同じ画像内の類似した領域を混合することで、混合画像と対応するソフトラベルの変動を弱める、安定化された SelfMix 拡張を導入します。
FerKD は、以前の FKD ソリューションにあったいくつかのヒューリスティックとハイパーパラメータを排除した、直感的で適切に設計された学習システムです。
さらに重要なことは、ImageNet-1K とダウンストリームタスクで顕著な改善が達成されていることです。
たとえば、FerKD は、ImageNet-1K と ResNet-50 で 81.2% を達成し、FKD や FunMatch を大幅に上回っています。
より優れた事前トレーニング済みの重みと大規模なアーキテクチャを活用して、微調整された ViT-G14 は 89.9% も達成します。
私たちのコードは https://github.com/szq0214/FKD/tree/main/FerKD で入手できます。

要約(オリジナル)

We present FerKD, a novel efficient knowledge distillation framework that incorporates partial soft-hard label adaptation coupled with a region-calibration mechanism. Our approach stems from the observation and intuition that standard data augmentations, such as RandomResizedCrop, tend to transform inputs into diverse conditions: easy positives, hard positives, or hard negatives. In traditional distillation frameworks, these transformed samples are utilized equally through their predictive probabilities derived from pretrained teacher models. However, merely relying on prediction values from a pretrained teacher, a common practice in prior studies, neglects the reliability of these soft label predictions. To address this, we propose a new scheme that calibrates the less-confident regions to be the context using softened hard groundtruth labels. Our approach involves the processes of hard regions mining + calibration. We demonstrate empirically that this method can dramatically improve the convergence speed and final accuracy. Additionally, we find that a consistent mixing strategy can stabilize the distributions of soft supervision, taking advantage of the soft labels. As a result, we introduce a stabilized SelfMix augmentation that weakens the variation of the mixed images and corresponding soft labels through mixing similar regions within the same image. FerKD is an intuitive and well-designed learning system that eliminates several heuristics and hyperparameters in former FKD solution. More importantly, it achieves remarkable improvement on ImageNet-1K and downstream tasks. For instance, FerKD achieves 81.2% on ImageNet-1K with ResNet-50, outperforming FKD and FunMatch by remarkable margins. Leveraging better pre-trained weights and larger architectures, our finetuned ViT-G14 even achieves 89.9%. Our code is available at https://github.com/szq0214/FKD/tree/main/FerKD.

arxiv情報

著者	Zhiqiang Shen
発行日	2023-12-29 05:02:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FerKD: Surgical Label Adaptation for Efficient Distillation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー