CHOSEN: Compilation to Hardware Optimization Stack for Efficient Vision Transformer Inference

要約

ビジョントランスフォーマー (ViT) は、コンピュータービジョンへの機械学習アプローチにおける画期的な変化を表します。
従来のアプローチとは異なり、ViT は、自然言語処理で広く使用されているセルフアテンションメカニズムを使用して画像パッチを分析します。
ビジュアルタスクのモデリングには利点があるにもかかわらず、ViT をハードウェアプラットフォーム、特にフィールドプログラマブルゲートアレイ (FPGA) に展開すると、かなりの課題が生じます。
これらの課題は主に、ViT の非線形計算と高い計算量とメモリ要求に起因しています。
このペーパーでは、これらの課題に対処し、パフォーマンスを最大化するために FPGA 上で ViT を導入するための自動フレームワークを提供するソフトウェアとハードウェアの共同設計フレームワークである CHOSEN を紹介します。
私たちのフレームワークは 3 つの基本的な貢献に基づいて構築されています。主にマルチ DDR メモリバンクの利点をターゲットにした帯域幅を最大化するマルチカーネル設計、精度の低下を最小限に抑える近似非線形関数、FPGA 上で利用可能なロジックブロックの効率的な使用です。
最適なスループットとレイテンシを達成する最適なハードウェア構成を見つけるための設計空間探索のための新しいアルゴリズムを提示することにより、コンピューティングカーネルのパフォーマンスとメモリ効率を最大化する効率的なコンパイラ。
最先端の ViT アクセラレータと比較して、CHOSEN は DeiT-S および DeiT-B モデルでスループットの 1.5 倍と 1.42 倍の向上を達成しました。

要約(オリジナル)

Vision Transformers (ViTs) represent a groundbreaking shift in machine learning approaches to computer vision. Unlike traditional approaches, ViTs employ the self-attention mechanism, which has been widely used in natural language processing, to analyze image patches. Despite their advantages in modeling visual tasks, deploying ViTs on hardware platforms, notably Field-Programmable Gate Arrays (FPGAs), introduces considerable challenges. These challenges stem primarily from the non-linear calculations and high computational and memory demands of ViTs. This paper introduces CHOSEN, a software-hardware co-design framework to address these challenges and offer an automated framework for ViT deployment on the FPGAs in order to maximize performance. Our framework is built upon three fundamental contributions: multi-kernel design to maximize the bandwidth, mainly targeting benefits of multi DDR memory banks, approximate non-linear functions that exhibit minimal accuracy degradation, and efficient use of available logic blocks on the FPGA, and efficient compiler to maximize the performance and memory-efficiency of the computing kernels by presenting a novel algorithm for design space exploration to find optimal hardware configuration that achieves optimal throughput and latency. Compared to the state-of-the-art ViT accelerators, CHOSEN achieves a 1.5x and 1.42x improvement in the throughput on the DeiT-S and DeiT-B models.

arxiv情報

著者	Mohammad Erfan Sadeghi,Arash Fayyazi,Suhas Somashekar,Massoud Pedram
発行日	2024-07-17 16:56:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CHOSEN: Compilation to Hardware Optimization Stack for Efficient Vision Transformer Inference

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー