Dynamically Reconfigurable Variable-precision Sparse-Dense Matrix Acceleration in Tensorflow Lite

要約

タイトル：Tensorflow Liteにおける動的再構成可能な可変精度スパース-デンスマトリックスアクセラレーション

要約：
– FADES（DEnseとSparse行列のための融合アーキテクチャ）と呼ばれる動的に再構成可能なハードウェアアクセラレータを提案。
– FADESデザインは、並列性と複雑さをトレードオフする複数の構成オプションを提供し、データフローモデルを使用して、読み取り、計算、スケーリング、および結果の書き込みの4つのステージを作成します。
– FADESは、プログラマブルロジック（PL）にマップされ、TensorFlow Lite推論エンジンと統合された異種SoCデバイスの処理システム（PS）で実行されます。
– アクセラレータはテンソル演算を計算するために使用され、動的に再構成可能なアプローチは、int8とfloatモードの精度を切り替えるために使用できます。
– この動的再構成により、リソース制約のあるデバイスにコアをマッピングすることができ、両方の算術精度を同時にサポートするよりも、より良いパフォーマンスと低電力消費が実現できます。
– 高性能なシストリックアーキテクチャに比べて、提案されたハードウェアは、同じテクノロジーで半分のDSPブロックを使用し、デンスモードで25％優れたパフォーマンスを提供します。
– スパースモードでは、低いスパースレベルでもコアがデンスモードを上回ることを示し、単一コアはソフトウェア最適化されたNEON RUYライブラリよりも最大20倍の加速を実現します。

要約(オリジナル)

In this paper, we present a dynamically reconfigurable hardware accelerator called FADES (Fused Architecture for DEnse and Sparse matrices). The FADES design offers multiple configuration options that trade off parallelism and complexity using a dataflow model to create four stages that read, compute, scale and write results. FADES is mapped to the programmable logic (PL) and integrated with the TensorFlow Lite inference engine running on the processing system (PS) of a heterogeneous SoC device. The accelerator is used to compute the tensor operations, while the dynamically reconfigurable approach can be used to switch precision between int8 and float modes. This dynamic reconfiguration enables better performance by allowing more cores to be mapped to the resource-constrained device and lower power consumption compared with supporting both arithmetic precisions simultaneously. We compare the proposed hardware with a high-performance systolic architecture for dense matrices obtaining 25% better performance in dense mode with half the DSP blocks in the same technology. In sparse mode, we show that the core can outperform dense mode even at low sparsity levels, and a single-core achieves up to 20x acceleration over the software-optimized NEON RUY library.

arxiv情報

著者	Jose Nunez-Yanez,Andres Otero,Eduardo de la Torre
発行日	2023-04-17 12:31:50+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Dynamically Reconfigurable Variable-precision Sparse-Dense Matrix Acceleration in Tensorflow Lite

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー