Faster Attention Is What You Need: A Fast Self-Attention Neural Network Backbone Architecture for the Edge via Double-Condensing Attention Condensers

要約

オンデバイスのTinyMLアプリケーションにディープラーニングが採用されるようになり、エッジに最適化された効率的なニューラルネットワークのバックボーンに対する需要がますます高まってきています。近年、アテンション・コンデンサー・ネットワークの導入により、精度と速度のバランスを強く取った、低フットプリントで高効率な自己アテンション・ニューラル・ネットワークが実現されています。本研究では、高度に凝縮された特徴埋め込みを可能にする二重凝縮型アテンションコンデンサーと呼ばれる高速なアテンションコンデンサー設計を導入しています。さらに、効率性と頑健性を高めるために、ベストプラクティスに基づく設計制約を課す機械駆動型の設計探索戦略を採用し、バックボーンのマクロ-ミクロアーキテクチャ構成を生成する。その結果、AttendNeXtと名付けたバックボーンは、他のいくつかの最先端効率的バックボーンと比較して、組み込みARMプロセッサ上で著しく高い推論スループットを達成しました（高精度・高速のFB-Net Cより10倍以上速く、より小さいサイズのMobileOne-S1より10倍以上速く）、また小さなモデルサイズ（高精度・高速のMobileNetv3-Lより1.37倍以上小さい）、強い精度（高速でImageNetのMobileViT XSより上位1％の精度）でもあります。これらの有望な結果は、さまざまな効率的なアーキテクチャ設計と自己アテンション機構を探ることで、TinyMLアプリケーションのための興味深い新しいビルディングブロックが得られることを実証しています。

要約(オリジナル)

With the growing adoption of deep learning for on-device TinyML applications, there has been an ever-increasing demand for efficient neural network backbones optimized for the edge. Recently, the introduction of attention condenser networks have resulted in low-footprint, highly-efficient, self-attention neural networks that strike a strong balance between accuracy and speed. In this study, we introduce a faster attention condenser design called double-condensing attention condensers that allow for highly condensed feature embeddings. We further employ a machine-driven design exploration strategy that imposes design constraints based on best practices for greater efficiency and robustness to produce the macro-micro architecture constructs of the backbone. The resulting backbone (which we name AttendNeXt) achieves significantly higher inference throughput on an embedded ARM processor when compared to several other state-of-the-art efficient backbones (>10x faster than FB-Net C at higher accuracy and speed and >10x faster than MobileOne-S1 at smaller size) while having a small model size (>1.37x smaller than MobileNetv3-L at higher accuracy and speed) and strong accuracy (1.1% higher top-1 accuracy than MobileViT XS on ImageNet at higher speed). These promising results demonstrate that exploring different efficient architecture designs and self-attention mechanisms can lead to interesting new building blocks for TinyML applications.

arxiv情報

著者	Alexander Wong,Mohammad Javad Shafiee,Saad Abbasi,Saeejith Nair,Mahmoud Famouri
発行日	2023-02-03 17:18:22+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Faster Attention Is What You Need: A Fast Self-Attention Neural Network Backbone Architecture for the Edge via Double-Condensing Attention Condensers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー