Efficient One Pass Self-distillation with Zipf’s Label Smoothing

要約

自己蒸留は、トレーニング中にそれ自体からの不均一なソフト監視を利用し、実行時のコストなしでパフォーマンスを向上させます。
ただし、トレーニング中のオーバーヘッドは見過ごされがちですが、巨大モデルの時代では、トレーニング中の時間とメモリのオーバーヘッドを削減することがますます重要になっています。
この論文では、Zipfのラベル平滑化（ZipfのLS）という名前の効率的な自己蒸留法を提案します。これは、ネットワークのオンザフライ予測を使用して、対照的なサンプルや補助パラメーターを使用せずにZipf分布に準拠するソフト監視を生成します。
私たちのアイデアは、ネットワークが適切にトレーニングされると、ネットワークの最終的なソフトマックスレイヤーの出力値が、大きさで並べ替えられ、サンプル全体で平均化された後、自然言語の単語頻度統計のジップの法則を彷彿とさせる分布に従う必要があるという経験的観察に基づいています。
。
このプロパティをサンプルレベルでトレーニング期間全体にわたって実施することにより、予測精度を大幅に向上させることができます。
INAT21のきめ細かい分類データセットでResNet50を使用すると、私たちの手法は、バニラベースラインと比較して+ 3.61％の精度向上を達成し、以前のラベル平滑化または自己蒸留戦略に対して0.88％高い向上を達成します。
実装はhttps://github.com/megvii-research/zipflsで公開されています。

要約(オリジナル)

Self-distillation exploits non-uniform soft supervision from itself during training and improves performance without any runtime cost. However, the overhead during training is often overlooked, and yet reducing time and memory overhead during training is increasingly important in the giant models’ era. This paper proposes an efficient self-distillation method named Zipf’s Label Smoothing (Zipf’s LS), which uses the on-the-fly prediction of a network to generate soft supervision that conforms to Zipf distribution without using any contrastive samples or auxiliary parameters. Our idea comes from an empirical observation that when the network is duly trained the output values of a network’s final softmax layer, after sorting by the magnitude and averaged across samples, should follow a distribution reminiscent to Zipf’s Law in the word frequency statistics of natural languages. By enforcing this property on the sample level and throughout the whole training period, we find that the prediction accuracy can be greatly improved. Using ResNet50 on the INAT21 fine-grained classification dataset, our technique achieves +3.61% accuracy gain compared to the vanilla baseline, and 0.88% more gain against the previous label smoothing or self-distillation strategies. The implementation is publicly available at https://github.com/megvii-research/zipfls.

arxiv情報

著者	Jiajun Liang,Linze Li,Zhaodong Bing,Borui Zhao,Yao Tang,Bo Lin,Haoqiang Fan
発行日	2022-07-26 15:40:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Efficient One Pass Self-distillation with Zipf’s Label Smoothing

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー