CPT-V: A Contrastive Approach to Post-Training Quantization of Vision Transformers

要約

トレーニング後の量子化を検討する場合、これまでの研究では通常、混合精度スキームの開発、または量子化のためにネットワークを分割する最良の方法の学習に焦点が当てられてきました。
私たちの研究である CPT-V では、量子化スケールを摂動するだけで、既に量子化されているネットワークの精度を向上させる一般的な方法を検討しています。
自己教師あり学習からコントラスト損失のアイデアを借りて、わずか 1,000 のキャリブレーション画像を使用して損失関数を共同で最小化する堅牢な方法を見つけました。
最高のパフォーマンスを発揮する量子化スケールを決定するために、CPT-V は、量子化モデルと完全精度モデルの機能を自己監視方式で比較します。
従来の再構成ベースの損失関数とは異なり、対照的な損失関数を使用すると、量子化された出力と完全精度の出力の間の類似性が得られるだけでなく、量子化された出力を特定のバッチ内の他の出力と区別するのにも役立ちます。
さらに、以前の研究とは対照的に、CPT-V はブロックごとの進化的検索を提案して、グローバルなコントラスト損失目標を最小化し、既存のビジョントランスフォーマー (ViT) 量子化スキームの精度向上を可能にします。
たとえば、CPT-V は、完全に量子化された ViT-Base の上位 1 精度を、3 ビット、4 ビット、および 8 ビットの重み量子化レベルで 10.30%、0.78%、および 0.15% 向上させます。
他のさまざまな ViT アーキテクチャでの広範な実験により、極端な量子化シナリオでの堅牢性がさらに実証されています。
コードはで入手できます。

要約(オリジナル)

When considering post-training quantization, prior work has typically focused on developing a mixed precision scheme or learning the best way to partition a network for quantization. In our work, CPT-V, we look at a general way to improve the accuracy of networks that have already been quantized, simply by perturbing the quantization scales. Borrowing the idea of contrastive loss from self-supervised learning, we find a robust way to jointly minimize a loss function using just 1,000 calibration images. In order to determine the best performing quantization scale, CPT-V contrasts the features of quantized and full precision models in a self-supervised fashion. Unlike traditional reconstruction-based loss functions, the use of a contrastive loss function not only rewards similarity between the quantized and full precision outputs but also helps in distinguishing the quantized output from other outputs within a given batch. In addition, in contrast to prior works, CPT-V proposes a block-wise evolutionary search to minimize a global contrastive loss objective, allowing for accuracy improvement of existing vision transformer (ViT) quantization schemes. For example, CPT-V improves the top-1 accuracy of a fully quantized ViT-Base by 10.30%, 0.78%, and 0.15% for 3-bit, 4-bit, and 8-bit weight quantization levels. Extensive experiments on a variety of other ViT architectures further demonstrate its robustness in extreme quantization scenarios. Our code is available at .

arxiv情報

著者	Natalia Frumkin,Dibakar Gope,Diana Marculescu
発行日	2022-11-17 16:41:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CPT-V: A Contrastive Approach to Post-Training Quantization of Vision Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー