Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity

要約

組み込みデバイス上のディープニューラルネットワーク (DNN) の効率的な処理に対する需要は、その導入を制限する大きな課題となっています。
ネットワークの機能マップのスパース性を利用することは、推論遅延を短縮する方法の 1 つです。
非構造化スパース性は、構造化スパース性と比較して精度の低下が低いことが知られていますが、前者では、レイテンシの利点を得るために大規模な推論エンジンの変更が必要です。
この課題に取り組むために、ランタイムのマイナーな変更を通じて悪用可能な半構造化されたアクティベーションの希薄性を誘発するソリューションを提案します。
推論時に高い高速化レベルを達成するために、一般行列乗算 (GEMM) の計算中にアクティベーションの最終位置を意識したスパーストレーニング手順を設計します。
私たちは、画像分類および物体検出タスクのさまざまなモデルにわたって、提案されたソリューションを広範囲に評価します。
驚くべきことに、私たちのアプローチでは、ImageNet データセット上の ResNet18 モデルに対して、精度の低下が最小限の $1.1\%$ でありながら、$1.25 \times$ の速度向上が得られます。
さらに、最先端の構造化プルーニング手法と組み合わせると、結果として得られるモデルはレイテンシと精度の優れたトレードオフを提供し、構造化プルーニング手法のみを使用するモデルよりも優れたパフォーマンスを発揮します。

要約(オリジナル)

The demand for efficient processing of deep neural networks (DNNs) on embedded devices is a significant challenge limiting their deployment. Exploiting sparsity in the network’s feature maps is one of the ways to reduce its inference latency. It is known that unstructured sparsity results in lower accuracy degradation with respect to structured sparsity but the former needs extensive inference engine changes to get latency benefits. To tackle this challenge, we propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications. To attain high speedup levels at inference time, we design a sparse training procedure with awareness of the final position of the activations while computing the General Matrix Multiplication (GEMM). We extensively evaluate the proposed solution across various models for image classification and object detection tasks. Remarkably, our approach yields a speed improvement of $1.25 \times$ with a minimal accuracy drop of $1.1\%$ for the ResNet18 model on the ImageNet dataset. Furthermore, when combined with a state-of-the-art structured pruning method, the resulting models provide a good latency-accuracy trade-off, outperforming models that solely employ structured pruning techniques.

arxiv情報

著者	Matteo Grimaldi,Darshan C. Ganji,Ivan Lazarevich,Sudhakar Sah
発行日	2023-09-27 17:48:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー