Deeploy: Enabling Energy-Efficient Deployment of Small Language Models On Heterogeneous Microcontrollers

要約

EFM (Embedded Foundation Model)、特に Small Language Model (SLM) の台頭により、Transformer をエッジアプリケーションに適応させることが非常に活発な研究分野になりました。
ただし、高帯域幅のオフチップメインメモリアクセスを使用せずに、マイクロコントローラー (MCU) クラスのチップ上で SLM のエンドツーエンド展開を実現することは、依然として未解決の課題です。
このペーパーでは、ML 命令拡張とハードウェアニューラルプロセッシングユニット (NPU) で強化されたマルチコア RISC-V (RV32) MCU 上での高効率のエンドツーエンド SLM 展開を実証します。
異種 (マルチコア + NPU) リソースでの積極的な SLM デプロイメントに伴う、制約のある多次元メモリと計算のトレードオフの調査を自動化するために、高度に最適化された C を生成する新しいディープニューラルネットワーク (DNN) コンパイラーである Deeploy を導入します。
最小限のランタイムサポートを必要とするコード。
Deeploy が RV32 コアの命令拡張機能と NPU を最大限に活用して、SLM を実行するためのエンドツーエンドコードを生成することを実証します。\SI{490}{\micro\joule \per Token} という最先端のエネルギーとスループットを達成します。
、TinyStories データセットでトレーニングされた SLM の \SI{340}{Token \per \second} は、外部メモリのない MCU クラスのデバイスで初めて実行されます。

要約(オリジナル)

With the rise of Embodied Foundation Models (EFMs), most notably Small Language Models (SLMs), adapting Transformers for edge applications has become a very active field of research. However, achieving end-to-end deployment of SLMs on microcontroller (MCU)-class chips without high-bandwidth off-chip main memory access is still an open challenge. In this paper, we demonstrate high-efficiency end-to-end SLM deployment on a multicore RISC-V (RV32) MCU augmented with ML instruction extensions and a hardware neural processing unit (NPU). To automate the exploration of the constrained, multi-dimensional memory vs. computation tradeoffs involved in aggressive SLM deployment on heterogeneous (multicore+NPU) resources, we introduce Deeploy, a novel Deep Neural Network (DNN) compiler, which generates highly-optimized C code requiring minimal runtime support. We demonstrate that Deeploy generates end-to-end code for executing SLMs, fully exploiting the RV32 cores’ instruction extensions and the NPU: We achieve leading-edge energy and throughput of \SI{490}{\micro\joule \per Token}, at \SI{340}{Token \per \second} for an SLM trained on the TinyStories dataset, running for the first time on an MCU-class device without external memory.

arxiv情報

著者	Moritz Scherer,Luka Macan,Victor Jung,Philip Wiese,Luca Bompani,Alessio Burrello,Francesco Conti,Luca Benini
発行日	2024-08-08 12:40:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Deeploy: Enabling Energy-Efficient Deployment of Small Language Models On Heterogeneous Microcontrollers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー