Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow

要約

Tiny Machine Learning (tinyML) の課題の 1 つは、畳み込みニューラルネットワークからトランスフォーマーまでの機械学習モデルの進化に追いつくことです。
私たちは、RISC-V プロセッサと自動導入フローによってサポートされるハードワイヤードアクセラレータを結合する異種アーキテクチャテンプレートを活用することで、この問題に対処します。
オクタコアクラスターと量子化されたアテンション用のアクセラレータを組み合わせた tinyML パワーエンベロープでアテンションベースのモデルをデモします。
当社の導入フローにより、エンドツーエンドの 8 ビット MobileBERT が可能になり、52.0 mW (0.65 V、22 nm FD-SOI) を消費する最先端のエネルギー効率と 32.5 Inf/s で 2960 GOp/J および 154 GOp/s のスループットを達成します。
テクノロジー）。

要約(オリジナル)

One of the challenges for Tiny Machine Learning (tinyML) is keeping up with the evolution of Machine Learning models from Convolutional Neural Networks to Transformers. We address this by leveraging a heterogeneous architectural template coupling RISC-V processors with hardwired accelerators supported by an automated deployment flow. We demonstrate an Attention-based model in a tinyML power envelope with an octa-core cluster coupled with an accelerator for quantized Attention. Our deployment flow enables an end-to-end 8-bit MobileBERT, achieving leading-edge energy efficiency and throughput of 2960 GOp/J and 154 GOp/s at 32.5 Inf/s consuming 52.0 mW (0.65 V, 22 nm FD-SOI technology).

arxiv情報

著者	Philip Wiese,Gamze İslamoğlu,Moritz Scherer,Luka Macan,Victor J. B. Jung,Alessio Burrello,Francesco Conti,Luca Benini
発行日	2024-08-05 13:57:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー