Atleus: Accelerating Transformers on the Edge Enabled by 3D Heterogeneous Manycore Architectures

要約

Transformer アーキテクチャは、自然言語処理やコンピュータービジョンを含むさまざまな機械学習アプリケーションの標準ニューラルネットワークモデルになっています。
ただし、トランスフォーマーモデルによって導入されたコンピューティング要件とメモリ要件により、トランスフォーマーモデルをエッジアプリケーションに採用することが困難になります。
さらに、事前トレーニングされたトランスフォーマー (基礎モデルなど) を微調整することは、特定のタスク/アプリケーションでのモデルの予測パフォーマンスを向上させるための一般的なタスクです。
既存の変圧器加速器は、微調整によってもたらされる複雑さに気づいていません。
この論文では、微調整と推論の 2 つの目的でトランスモデルを高速化するために特に最適化された異種コンピューティングリソースを組み込んだ、Atleus と呼ばれる 3 次元 (3D) 異種アーキテクチャの設計を提案します。
具体的には、Atleus は不揮発性メモリとシストリックアレイを利用して、統合 3D プラットフォームを使用してトランス計算カーネルを高速化します。
さらに、高性能とエネルギー効率を実現するために適切な NoC を設計します。
最後に、Atleus はモデル圧縮をサポートするために効果的な量子化スキームを採用しています。
実験結果は、Atleus が既存の最先端技術をパフォーマンスとエネルギー効率の点でそれぞれ最大 56 倍と最大 64.5 倍上回ることを示しています。

要約(オリジナル)

Transformer architectures have become the standard neural network model for various machine learning applications including natural language processing and computer vision. However, the compute and memory requirements introduced by transformer models make them challenging to adopt for edge applications. Furthermore, fine-tuning pre-trained transformers (e.g., foundation models) is a common task to enhance the model’s predictive performance on specific tasks/applications. Existing transformer accelerators are oblivious to complexities introduced by fine-tuning. In this paper, we propose the design of a three-dimensional (3D) heterogeneous architecture referred to as Atleus that incorporates heterogeneous computing resources specifically optimized to accelerate transformer models for the dual purposes of fine-tuning and inference. Specifically, Atleus utilizes non-volatile memory and systolic array for accelerating transformer computational kernels using an integrated 3D platform. Moreover, we design a suitable NoC to achieve high performance and energy efficiency. Finally, Atleus adopts an effective quantization scheme to support model compression. Experimental results demonstrate that Atleus outperforms existing state-of-the-art by up to 56x and 64.5x in terms of performance and energy efficiency respectively

arxiv情報

著者	Pratyush Dhingra,Janardhan Rao Doppa,Partha Pratim Pande
発行日	2025-01-16 15:11:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Atleus: Accelerating Transformers on the Edge Enabled by 3D Heterogeneous Manycore Architectures

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー