DenseShift: Towards Accurate and Efficient Low-Bit Power-of-Two Quantization

要約

リソース要件が増え続けるため、低リソースのエッジデバイスにディープニューラルネットワークを効率的に展開することは困難です。
この問題に対処するために、研究者らは、メモリ使用量を削減し、計算を簡素化することを目的とした、2 のべき乗量子化 (シフトネットワークとも呼ばれる) などの乗算のないニューラルネットワークを提案しました。
ただし、既存の低ビットシフトネットワークは、完全精度のネットワークほど正確ではなく、通常、重み範囲の制限された符号化方式と量子化損失に悩まされています。
この論文では、Shift ネットワークの精度を大幅に向上させ、視覚および音声アプリケーション向けの完全精度ネットワークに匹敵するパフォーマンスを実現する DenseShift ネットワークを提案します。
さらに、非量子化浮動小数点アクティベーションを使用して効率的な DenseShift ネットワークを展開し、既存の方法と比較して 1.6 倍の高速化を実現する方法を紹介します。
これを達成するために、低ビットシフトネットワークのゼロ重み値がモデルの容量に寄与せず、推論計算に悪影響を与えることを実証します。
この問題に対処するために、推論を簡素化し、モデルの能力を向上させるゼロフリーシフトメカニズムを提案します。
さらに、トレーニング効率を向上させるための符号スケール分解設計と、モデルの転移学習パフォーマンスを向上させるための低分散ランダム初期化戦略を提案します。
さまざまなコンピュータービジョンおよび音声タスクに関する広範な実験により、DenseShift が既存の低ビット乗算のないネットワークを上回り、完全精度ネットワークと比較して競争力のあるパフォーマンスを実現することが実証されました。
さらに、私たちが提案したアプローチは、精度を低下させることなく強力な転移学習パフォーマンスを示します。
私たちのコードは GitHub でリリースされました。

要約(オリジナル)

Efficiently deploying deep neural networks on low-resource edge devices is challenging due to their ever-increasing resource requirements. To address this issue, researchers have proposed multiplication-free neural networks, such as Power-of-Two quantization, or also known as Shift networks, which aim to reduce memory usage and simplify computation. However, existing low-bit Shift networks are not as accurate as their full-precision counterparts, typically suffering from limited weight range encoding schemes and quantization loss. In this paper, we propose the DenseShift network, which significantly improves the accuracy of Shift networks, achieving competitive performance to full-precision networks for vision and speech applications. In addition, we introduce a method to deploy an efficient DenseShift network using non-quantized floating-point activations, while obtaining 1.6X speed-up over existing methods. To achieve this, we demonstrate that zero-weight values in low-bit Shift networks do not contribute to model capacity and negatively impact inference computation. To address this issue, we propose a zero-free shifting mechanism that simplifies inference and increases model capacity. We further propose a sign-scale decomposition design to enhance training efficiency and a low-variance random initialization strategy to improve the model’s transfer learning performance. Our extensive experiments on various computer vision and speech tasks demonstrate that DenseShift outperforms existing low-bit multiplication-free networks and achieves competitive performance compared to full-precision networks. Furthermore, our proposed approach exhibits strong transfer learning performance without a drop in accuracy. Our code was released on GitHub.

arxiv情報

著者	Xinlin Li,Bang Liu,Rui Heng Yang,Vanessa Courville,Chao Xing,Vahid Partovi Nia
発行日	2023-10-24 16:22:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DenseShift: Towards Accurate and Efficient Low-Bit Power-of-Two Quantization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー