Exploiting Kernel Compression on BNNs

要約

バイナリニューラルネットワーク (BNN) は、現実的な画像分類タスクで大きな成功を収めています。
特に、それらの精度は、エッジデバイスに合わせて調整された完全精度モデルによって得られる最先端の精度に似ています。
この点で、BNN は入力と重みを格納するために 1 ビットを使用するため、エッジデバイスに非常に適しています。したがって、ストレージ要件は低くなります。
また、BNN の計算は主に、単純なハードウェア構造を使用して非常に効率的に実装される xnor および pop-counts 操作を使用して行われます。
とはいえ、モバイル CPU で BNN を効率的にサポートすることは簡単なことではありません。なぜなら、重みと入力をロードするための頻繁なメモリアクセスによって BNN の利点が妨げられるからです。
BNN では、重みまたは入力は 1 ビットを使用して格納され、格納と計算の効率を高めることを目的として、それらのいくつかが一連のビットとしてパックされます。
この作業では、一連の重みを表す一意のシーケンスの数が通常少ないことがわかります。
また、BNN レイヤーの評価中に、一意のシーケンスの小さなグループが他のものよりも頻繁に使用されることがわかりました。
したがって、ハフマンエンコーディングを使用してビットシーケンスをエンコードし、間接テーブルを使用して BNN 評価中にそれらをデコードすることにより、この観察結果を活用することを提案します。
また、最も一般的なビットシーケンスを識別し、あまり一般的でないものをいくつかの同様の一般的なシーケンスに置き換えるクラスタリングスキームを提案します。
したがって、一般的なシーケンスがより少ないビットでエンコードされるため、ストレージ要件とメモリアクセスが減少します。
圧縮されたビットシーケンスを効率的にキャッシュおよびデコードできる小さなハードウェア構造を追加することで、モバイル CPU を拡張します。
Imagenet データセットを使用した ReAacNet モデルを使用して、スキームを評価します。
私たちの実験結果は、私たちの技術がメモリ要件を 1.32x 削減し、パフォーマンスを 1.35x 向上させることができることを示しています。

要約(オリジナル)

Binary Neural Networks (BNNs) are showing tremendous success on realistic image classification tasks. Notably, their accuracy is similar to the state-of-the-art accuracy obtained by full-precision models tailored to edge devices. In this regard, BNNs are very amenable to edge devices since they employ 1-bit to store the inputs and weights, and thus, their storage requirements are low. Also, BNNs computations are mainly done using xnor and pop-counts operations which are implemented very efficiently using simple hardware structures. Nonetheless, supporting BNNs efficiently on mobile CPUs is far from trivial since their benefits are hindered by frequent memory accesses to load weights and inputs. In BNNs, a weight or an input is stored using one bit, and aiming to increase storage and computation efficiency, several of them are packed together as a sequence of bits. In this work, we observe that the number of unique sequences representing a set of weights is typically low. Also, we have seen that during the evaluation of a BNN layer, a small group of unique sequences is employed more frequently than others. Accordingly, we propose exploiting this observation by using Huffman Encoding to encode the bit sequences and then using an indirection table to decode them during the BNN evaluation. Also, we propose a clustering scheme to identify the most common sequences of bits and replace the less common ones with some similar common sequences. Hence, we decrease the storage requirements and memory accesses since common sequences are encoded with fewer bits. We extend a mobile CPU by adding a small hardware structure that can efficiently cache and decode the compressed sequence of bits. We evaluate our scheme using the ReAacNet model with the Imagenet dataset. Our experimental results show that our technique can reduce memory requirement by 1.32x and improve performance by 1.35x.

arxiv情報

著者	Franyell Silfa,Jose Maria Arnau,Antonio González
発行日	2022-12-01 16:05:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Exploiting Kernel Compression on BNNs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー