Convolutional Neural Networks Quantization with Attention

要約

トレーニングフェーズで 32 ビット浮動小数点数を使用する場合と比較して、深層畳み込みニューラルネットワーク (DCNN) は推論中に低い精度で動作できることが証明されており、それによってメモリスペースと消費電力を節約できます。
ただし、ネットワークの量子化には常に精度の低下が伴います。
ここでは、2 段階の Squeeze-and-Threshold (2 段階 ST) という方法を提案します。
Attention メカニズムを使用してネットワークを量子化し、最先端の結果を達成します。
私たちの方法を使用すると、3 ビットモデルは、完全精度のベースラインモデルの精度を超える精度を達成できます。
提案された 2 段階 ST 活性化量子化は簡単に適用できます: 畳み込みの前に挿入します。

要約(オリジナル)

It has been proven that, compared to using 32-bit floating-point numbers in the training phase, Deep Convolutional Neural Networks (DCNNs) can operate with low precision during inference, thereby saving memory space and power consumption. However, quantizing networks is always accompanied by an accuracy decrease. Here, we propose a method, double-stage Squeeze-and-Threshold (double-stage ST). It uses the attention mechanism to quantize networks and achieve state-of-art results. Using our method, the 3-bit model can achieve accuracy that exceeds the accuracy of the full-precision baseline model. The proposed double-stage ST activation quantization is easy to apply: inserting it before the convolution.

arxiv情報

著者	Binyi Wu,Bernd Waschneck,Christian Georg Mayr
発行日	2022-09-30 08:48:31+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Convolutional Neural Networks Quantization with Attention

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー