Information Theoretic Representation Distillation

要約

知識抽出の経験的な成功にもかかわらず、現在の最先端手法は訓練に計算コストがかかるため、実際に採用するのは難しい。この問題に対処するため、我々は安価なエントロピーのような推定量に着想を得た、2つの異なる相補的な損失を導入する。これらの損失は生徒と教師の表現の間の相関と相互情報を最大化することを目的としている。本手法は、他の手法に比べて学習コストが大幅に削減され、知識抽出とモデル間伝達のタスクにおいて、最先端技術に匹敵する性能を達成することができる。さらに、2値蒸留課題において、本手法の有効性を示し、2値定量化における新たな最先端技術を導き出し、完全精度モデルの性能に近づけることを示す。コード： www.github.com/roymiles/ITRD

要約(オリジナル)

Despite the empirical success of knowledge distillation, current state-of-the-art methods are computationally expensive to train, which makes them difficult to adopt in practice. To address this problem, we introduce two distinct complementary losses inspired by a cheap entropy-like estimator. These losses aim to maximise the correlation and mutual information between the student and teacher representations. Our method incurs significantly less training overheads than other approaches and achieves competitive performance to the state-of-the-art on the knowledge distillation and cross-model transfer tasks. We further demonstrate the effectiveness of our method on a binary distillation task, whereby it leads to a new state-of-the-art for binary quantisation and approaches the performance of a full precision model. Code: www.github.com/roymiles/ITRD

arxiv情報

著者	Roy Miles,Adrian Lopez Rodriguez,Krystian Mikolajczyk
発行日	2022-10-07 16:05:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Information Theoretic Representation Distillation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー