Fast, Scalable, Energy-Efficient Non-element-wise Matrix Multiplication on FPGA

要約

最新のニューラルネットワーク (NN) アーキテクチャは、膨大な数の積和演算に大きく依存しており、計算コストの大半を占めています。
したがって、この論文では、NN の基本コンポーネントとして、FPGA 上の高スループット、スケーラブル、エネルギー効率の高い非要素単位の行列乗算ユニットを提案します。
まず、LUT ベースの近似行列乗算である MADDNESS アルゴリズムの層間および層内の冗長性を合理化し、「近似乗算ユニット (AMU)」と呼ばれる高速で効率的でスケーラブルな近似行列乗算モジュールを設計します。
AMU は、専用のメモリ管理とアクセス設計を通じて LUT ベースの行列乗算をさらに最適化し、入力解像度から計算オーバーヘッドを切り離し、FPGA ベースの NN アクセラレータ効率を大幅に向上させます。
実験結果は、当社の AMU を使用すると、FPGA ベースの量子化ニューラルネットワーク (QNN) アクセラレータの最先端ソリューションと比較して、最大 9 倍のスループットと 112 倍のエネルギー効率を達成できることを示しています。

要約(オリジナル)

Modern Neural Network (NN) architectures heavily rely on vast numbers of multiply-accumulate arithmetic operations, constituting the predominant computational cost. Therefore, this paper proposes a high-throughput, scalable and energy efficient non-element-wise matrix multiplication unit on FPGAs as a basic component of the NNs. We firstly streamline inter-layer and intra-layer redundancies of MADDNESS algorithm, a LUT-based approximate matrix multiplication, to design a fast, efficient scalable approximate matrix multiplication module termed ‘Approximate Multiplication Unit (AMU)’. The AMU optimizes LUT-based matrix multiplications further through dedicated memory management and access design, decoupling computational overhead from input resolution and boosting FPGA-based NN accelerator efficiency significantly. The experimental results show that using our AMU achieves up to 9x higher throughput and 112x higher energy efficiency over the state-of-the-art solutions for the FPGA-based Quantised Neural Network (QNN) accelerators.

arxiv情報

著者	Xuqi Zhu,Huaizhi Zhang,JunKyu Lee,Jiacheng Zhu,Chandrajit Pal,Sangeet Saha,Klaus D. McDonald-Maier,Xiaojun Zhai
発行日	2024-07-02 15:28:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Fast, Scalable, Energy-Efficient Non-element-wise Matrix Multiplication on FPGA

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー