NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference

要約

個々のディープニューラルネットワーク (DNN) モデル内の計算タイプの固有の多様性により、ハードウェアプロセッサ内に対応するさまざまな計算ユニットが必要となり、ニューラルネットワークの実行中の計算効率に重大な制約が生じます。
この研究では、DNN 全体の計算を線形行列演算に変換し、1 つの汎用行列乗算 (GEMM) アクセラレータで効果的に実行できるようにするフレームワークである NeuralMatrix を紹介します。
このアプローチは、個々のネットワークモデルに必要なさまざまな計算タイプによってもたらされる制約を克服することで、単一の GEMM アクセラレータを使用して幅広い DNN モデルを実行できる汎用性と、追加の特殊な機能ユニットを必要としないアプリケーション固有のアクセラレーションレベルの両方を提供します。
メインストリーム DNN とそのバリアントモデルを通じて検証されます。

要約(オリジナル)

The inherent diversity of computation types within individual deep neural network (DNN) models necessitates a corresponding variety of computation units within hardware processors, leading to a significant constraint on computation efficiency during neural network execution. In this study, we introduce NeuralMatrix, a framework that transforms the computation of entire DNNs into linear matrix operations, effectively enabling their execution with one general-purpose matrix multiplication (GEMM) accelerator. By surmounting the constraints posed by the diverse computation types required by individual network models, this approach provides both generality, allowing a wide range of DNN models to be executed using a single GEMM accelerator and application-specific acceleration levels without extra special function units, which are validated through main stream DNNs and their variant models.

arxiv情報

著者	Ruiqi Sun,Jie Zhao,Xin He,Yiran Li,An Zou
発行日	2023-10-06 13:28:30+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー