Post-Training Quantization for Energy Efficient Realization of Deep Neural Networks

要約

エッジデバイスで生成されたデータの近くにディープニューラルネットワーク (DNN) を展開する際の最大の課題は、そのサイズ、つまりメモリフットプリントと計算の複雑さです。
どちらも量子化により大幅に削減されます。
結果として語長が短くなるため、DNN のエネルギー効率は比例して向上します。
ただし、語長を短くすると、通常は精度が低下します。
この影響を打ち消すために、量子化された DNN が再トレーニングされます。
残念ながら、トレーニングには、量子化された DNN の推論よりも最大 5000 倍のエネルギーが必要です。
この問題に対処するために、再トレーニングを必要としないトレーニング後の量子化フローを提案します。
このために、さまざまな量子化オプションを調査しました。
さらに、私たちの分析では、重みとアクティベーションの単語長の削減の影響を体系的に評価し、単語長の選択に関する明確な傾向を明らかにしています。
両方の側面は、これまで体系的に調査されていません。
私たちの結果は DNN の深さとは無関係であり、均一な量子化に適用されるため、事前にトレーニングされた特定の DNN の高速な量子化が可能になります。
ImageNet の 2.2% のトップ 1 精度で、6 ビットの最先端技術を上回っています。
再トレーニングなしで、8 ビットへの量子化は浮動小数点の精度を上回ります。

要約(オリジナル)

The biggest challenge for the deployment of Deep Neural Networks (DNNs) close to the generated data on edge devices is their size, i.e., memory footprint and computational complexity. Both are significantly reduced with quantization. With the resulting lower word-length, the energy efficiency of DNNs increases proportionally. However, lower word-length typically causes accuracy degradation. To counteract this effect, the quantized DNN is retrained. Unfortunately, training costs up to 5000x more energy than the inference of the quantized DNN. To address this issue, we propose a post-training quantization flow without the need for retraining. For this, we investigated different quantization options. Furthermore, our analysis systematically assesses the impact of reduced word-lengths of weights and activations revealing a clear trend for the choice of word-length. Both aspects have not been systematically investigated so far. Our results are independent of the depth of the DNNs and apply to uniform quantization, allowing fast quantization of a given pre-trained DNN. We excel state-of-the-art for 6 bit by 2.2% Top-1 accuracy for ImageNet. Without retraining, our quantization to 8 bit surpasses floating-point accuracy.

arxiv情報

著者	Cecilia Latotzke,Batuhan Balim,Tobias Gemmeke
発行日	2022-10-14 15:43:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Post-Training Quantization for Energy Efficient Realization of Deep Neural Networks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー