OpenGeMM: A High-Utilization GeMM Accelerator Generator with Lightweight RISC-V Control and Tight Memory Coupling

要約

ディープニューラルネットワーク (DNN) は、計算量とデータ量が多いという性質のため、リソースに制約のあるエクストリームエッジデバイスに導入すると、重大な課題に直面します。
特定のアプリケーションシナリオに合わせて調整されたスタンドアロンアクセラレータは、制御が柔軟でなくプログラマビリティが限られているという欠点がありますが、RISC-V CPU と組み合わせた汎用ハードウェアアクセラレータプラットフォームは、高い再利用性と柔軟性を実現できますが、通常はシステムレベルの効率と低い使用率が犠牲になります。
このギャップを埋めるために、私たちはオープンソースのアクセラレーションプラットフォームである OpenGeMM を提案します。これは、高い効率性と利用率、および構成とプログラムの容易さを共同で実証します。
OpenGeMM には、パラメータ化された Chisel コーディングされた GeMM アクセラレータ、軽量 RISC-V プロセッサ、および密結合されたマルチバンクスクラッチパッドメモリが含まれています。
GeMM コアの使用率とシステム効率は、コンフィギュレーションのプリロード、出力バッファリングを備えた入力プリフェッチ、プログラマブルストライドメモリアクセスという 3 つのメカニズムによって向上します。
実験結果は、OpenGeMM がさまざまな CNN および Transformer ワークロードにわたって 81.89% から 99.34% の範囲のハードウェア使用率を一貫して達成できることを示しています。
SotA オープンソース Gemmini アクセラレータと比較して、OpenGeMM は、4.68 TOPS/W のシステム効率を達成しながら、さまざまな GeMM ワークロードにわたって正規化されたスループットで 3.58 倍から 16.40 倍の高速化を実証します。

要約(オリジナル)

Deep neural networks (DNNs) face significant challenges when deployed on resource-constrained extreme edge devices due to their computational and data-intensive nature. While standalone accelerators tailored for specific application scenarios suffer from inflexible control and limited programmability, generic hardware acceleration platforms coupled with RISC-V CPUs can enable high reusability and flexibility, yet typically at the expense of system level efficiency and low utilization. To fill this gap, we propose OpenGeMM, an open-source acceleration platform, jointly demonstrating high efficiency and utilization, as well as ease of configurability and programmability. OpenGeMM encompasses a parameterized Chisel-coded GeMM accelerator, a lightweight RISC-V processor, and a tightly coupled multi-banked scratchpad memory. The GeMM core utilization and system efficiency are boosted through three mechanisms: configuration pre-loading, input pre-fetching with output buffering, and programmable strided memory access. Experimental results show that OpenGeMM can consistently achieve hardware utilization ranging from 81.89% to 99.34% across diverse CNN and Transformer workloads. Compared to the SotA open-source Gemmini accelerator, OpenGeMM demonstrates a 3.58x to 16.40x speedup on normalized throughput across a wide variety ofGeMM workloads, while achieving 4.68 TOPS/W system efficiency.

arxiv情報

著者	Xiaoling Yi,Ryan Antonio,Joren Dumoulin,Jiacong Sun,Josse Van Delm,Guilherme Paim,Marian Verhelst
発行日	2024-11-14 15:58:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

OpenGeMM: A High-Utilization GeMM Accelerator Generator with Lightweight RISC-V Control and Tight Memory Coupling

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー