Avatar Knowledge Distillation: Self-ensemble Teacher Paradigm with Uncertainty

要約

知識の蒸留は、ポケットサイズのモデルのパフォーマンスを向上させるための効果的なパラダイムであり、特に複数の教師モデルが利用可能な場合、生徒は再び上限を突破することになります。
しかし、使い捨て蒸留のために多様な教師モデルをトレーニングするのは経済的ではありません。
この論文では、教師から派生した推論アンサンブルモデルである蒸留用アバターと呼ばれる新しい概念を紹介します。
具体的には、 (1) 蒸留学習の反復ごとに、摂動変換によってさまざまなアバターが生成されます。
私たちは、アバターがより高い作業能力と指導能力の上限を備えており、学生モデルが教師モデルから多様で受容的な知識の視点を学ぶのに役立つことを検証します。
(2) 蒸留中に、知識伝達に対するアバターの貢献を適応的に調整するために、バニラ教師とアバターの間の統計的差異の分散から不確実性を認識した係数を提案します。
アバター知識の蒸留 AKD は既存の方法とは根本的に異なり、不平等なトレーニングという革新的な視点で改良を加えています。
包括的な実験により、追加の計算コストをかけずに高密度予測を実現する最先端の蒸留方法を磨き上げるアバターメカニズムの有効性が実証されています。
AKD は、COCO 2017 ではオブジェクト検出で最大 0.7 AP の増加、セマンティックセグメンテーションでは都市景観で 1.83 mIoU の増加をそれぞれもたらしました。

要約(オリジナル)

Knowledge distillation is an effective paradigm for boosting the performance of pocket-size model, especially when multiple teacher models are available, the student would break the upper limit again. However, it is not economical to train diverse teacher models for the disposable distillation. In this paper, we introduce a new concept dubbed Avatars for distillation, which are the inference ensemble models derived from the teacher. Concretely, (1) For each iteration of distillation training, various Avatars are generated by a perturbation transformation. We validate that Avatars own higher upper limit of working capacity and teaching ability, aiding the student model in learning diverse and receptive knowledge perspectives from the teacher model. (2) During the distillation, we propose an uncertainty-aware factor from the variance of statistical differences between the vanilla teacher and Avatars, to adjust Avatars’ contribution on knowledge transfer adaptively. Avatar Knowledge Distillation AKD is fundamentally different from existing methods and refines with the innovative view of unequal training. Comprehensive experiments demonstrate the effectiveness of our Avatars mechanism, which polishes up the state-of-the-art distillation methods for dense prediction without more extra computational cost. The AKD brings at most 0.7 AP gains on COCO 2017 for Object Detection and 1.83 mIoU gains on Cityscapes for Semantic Segmentation, respectively.

arxiv情報

著者	Yuan Zhang,Weihua Chen,Yichen Lu,Tao Huang,Xiuyu Sun,Jian Cao
発行日	2023-07-31 14:43:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Avatar Knowledge Distillation: Self-ensemble Teacher Paradigm with Uncertainty

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー