Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

要約

ニューラルネットワーク (NN) の重要性と複雑さが増しています。
ニューラルネットワークのパフォーマンス (およびエネルギー効率) は、計算リソースまたはメモリリソースによって制限される可能性があります。
計算がメモリアレイの近くまたはメモリアレイ内に配置されるプロセッシングインメモリ (PIM) パラダイムは、メモリバウンド NN を高速化するための実行可能なソリューションです。
ただし、PIM アーキテクチャの形式はさまざまで、PIM アプローチが異なればトレードオフも異なります。
私たちの目標は、NN のパフォーマンスとエネルギー効率について、DRAM ベースの PIM アーキテクチャを分析、議論、および対比することです。
そのために、3 つの最先端の PIM アーキテクチャを分析します。
(2) エッジデバイス向けに調整された 3D スタックベースの PIM アーキテクチャである Mensa。
(3) SIMDRAM。DRAM のアナログ原理を使用してビットシリアル操作を実行します。
私たちの分析は、PIM がメモリにバインドされた NN に大きなメリットをもたらすことを明らかにしています。
(2) Mensa は、24 の Google エッジ NN モデルで、エネルギー効率とスループットを Google Edge TPU よりも 3.0 倍および 3.1 倍向上させます。
(3) SIMDRAM は、3 つのバイナリ NN で CPU/GPU を 16.7x/1.4x 上回っています。
固有のアーキテクチャ設計の選択により、NN モデルの理想的な PIM アーキテクチャは、モデルの個別の属性に依存すると結論付けています。

要約(オリジナル)

Neural networks (NNs) are growing in importance and complexity. A neural network’s performance (and energy efficiency) can be bound either by computation or memory resources. The processing-in-memory (PIM) paradigm, where computation is placed near or within memory arrays, is a viable solution to accelerate memory-bound NNs. However, PIM architectures vary in form, where different PIM approaches lead to different trade-offs. Our goal is to analyze, discuss, and contrast DRAM-based PIM architectures for NN performance and energy efficiency. To do so, we analyze three state-of-the-art PIM architectures: (1) UPMEM, which integrates processors and DRAM arrays into a single 2D chip; (2) Mensa, a 3D-stack-based PIM architecture tailored for edge devices; and (3) SIMDRAM, which uses the analog principles of DRAM to execute bit-serial operations. Our analysis reveals that PIM greatly benefits memory-bound NNs: (1) UPMEM provides 23x the performance of a high-end GPU when the GPU requires memory oversubscription for a general matrix-vector multiplication kernel; (2) Mensa improves energy efficiency and throughput by 3.0x and 3.1x over the Google Edge TPU for 24 Google edge NN models; and (3) SIMDRAM outperforms a CPU/GPU by 16.7x/1.4x for three binary NNs. We conclude that the ideal PIM architecture for NN models depends on a model’s distinct attributes, due to the inherent architectural design choices.

arxiv情報

著者	Geraldo F. Oliveira,Juan Gómez-Luna,Saugata Ghose,Amirali Boroumand,Onur Mutlu
発行日	2023-03-27 17:16:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー