HG-PIPE: Vision Transformer Acceleration with Hybrid-Grained Pipeline

要約

フィールドプログラマブルゲートアレイ (FPGA) を使用したビジョントランスフォーマー (ViT) の高速化は有望ですが、課題があります。
既存の FPGA ベースの ViT アクセラレータは主にテンポラルアーキテクチャに依存しており、同じハードウェアブロックを再利用して異なる演算子を処理するため、膨大なメモリアクセスのオーバーヘッドが発生します。
パイプラインアーキテクチャは、粗粒度または細粒度のいずれかで、メモリアクセス効率を高めるために ViT 計算を空間的に展開します。
ただし、これらは通常、ViT のグローバルな計算依存性によって引き起こされる重大なハードウェアリソースの制約とパイプラインバブルに悩まされます。
このペーパーでは、高スループットかつ低遅延の ViT 処理を実現するパイプライン FPGA アクセラレータである HG-PIPE を紹介します。
HG-PIPE は、オンチップバッファのコストを削減するハイブリッド粒度パイプラインアーキテクチャを特徴とし、計算データフローと並列処理設計を組み合わせてパイプラインバブルを排除します。
HG-PIPE はさらに、豊富なルックアップテーブル (LUT) を使用して線形演算子と非線形演算子の両方を実装するための慎重な近似を導入し、リソースの制約を軽減します。
ZCU102 FPGA では、HG-PIPE は、AutoViTAcc などの従来技術のアクセラレータと比べて 2.78 倍優れたスループットと 2.52 倍優れたリソース効率を達成します。
VCK190 FPGA を使用する HG-PIPE は、単一デバイス上でエンドツーエンドの ViT アクセラレーションを実現し、V100 GPU の 2.81 倍である 7118 画像/秒を達成します。

要約(オリジナル)

Vision Transformer (ViT) acceleration with field programmable gate array (FPGA) is promising but challenging. Existing FPGA-based ViT accelerators mainly rely on temporal architectures, which process different operators by reusing the same hardware blocks and suffer from extensive memory access overhead. Pipelined architectures, either coarse-grained or fine-grained, unroll the ViT computation spatially for memory access efficiency. However, they usually suffer from significant hardware resource constraints and pipeline bubbles induced by the global computation dependency of ViT. In this paper, we introduce HG-PIPE, a pipelined FPGA accelerator for high-throughput and low-latency ViT processing. HG-PIPE features a hybrid-grained pipeline architecture to reduce on-chip buffer cost and couples the computation dataflow and parallelism design to eliminate the pipeline bubbles. HG-PIPE further introduces careful approximations to implement both linear and non-linear operators with abundant Lookup Tables (LUTs), thus alleviating resource constraints. On a ZCU102 FPGA, HG-PIPE achieves 2.78 times better throughput and 2.52 times better resource efficiency than the prior-art accelerators, e.g., AutoViTAcc. With a VCK190 FPGA, HG-PIPE realizes end-to-end ViT acceleration on a single device and achieves 7118 images/s, which is 2.81 times faster than a V100 GPU.

arxiv情報

著者	Qingyu Guo,Jiayong Wan,Songqiang Xu,Meng Li,Yuan Wang
発行日	2024-08-01 08:18:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

HG-PIPE: Vision Transformer Acceleration with Hybrid-Grained Pipeline

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー