Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers

要約

ビジョントランスフォーマー (ViT) は、畳み込みニューラルネットワーク (CNN) と比較して、コンピュータービジョンタスクの精度が優れていることが実証されています。
ただし、ViT モデルは、リソースが限られたエッジデバイスに効率的に導入するために、多くの場合、大量の計算を必要とします。
この研究では、精度を維持しながらハードウェア実装用の効率的な ViT モデルを設計するための、ViT 用のハードウェア指向の量子化対応アーキテクチャ検索フレームワークである Quasar-ViT を提案します。
まず、Quasar-ViT は、行単位の柔軟な混合精度量子化スキーム、混合精度重みエンタングルメント、およびスーパーネット層スケーリング技術を使用してスーパーネットをトレーニングします。
次に、ハードウェアレイテンシとリソースモデリングと統合された効率的なハードウェア指向の検索アルゴリズムを適用し、さまざまな推論レイテンシターゲットの下でスーパーネットから一連の最適なサブネットを決定します。
最後に、アーキテクチャの検索をサポートし、理論的な計算量の削減と実際の推論の高速化の間のギャップを軽減するために、FPGA プラットフォーム上で一連のモデル適応設計を提案します。
私たちが検索したモデルは、AMD/Xilinx ZCU102 FPGA 上で 101.5、159.6、251.6 フレーム/秒 (FPS) の推論速度を達成し、ImageNet データセットに対してそれぞれ 80.4%、78.6%、74.9% のトップ 1 精度を一貫して達成しました。
前作を上回る出来。

要約(オリジナル)

Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs). However, ViT models are often computation-intensive for efficient deployment on resource-limited edge devices. This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs, to design efficient ViT models for hardware implementation while preserving the accuracy. First, Quasar-ViT trains a supernet using our row-wise flexible mixed-precision quantization scheme, mixed-precision weight entanglement, and supernet layer scaling techniques. Then, it applies an efficient hardware-oriented search algorithm, integrated with hardware latency and resource modeling, to determine a series of optimal subnets from supernet under different inference latency targets. Finally, we propose a series of model-adaptive designs on the FPGA platform to support the architecture search and mitigate the gap between the theoretical computation reduction and the practical inference speedup. Our searched models achieve 101.5, 159.6, and 251.6 frames-per-second (FPS) inference speed on the AMD/Xilinx ZCU102 FPGA with 80.4%, 78.6%, and 74.9% top-1 accuracy, respectively, for the ImageNet dataset, consistently outperforming prior works.

arxiv情報

著者	Zhengang Li,Alec Lu,Yanyue Xie,Zhenglun Kong,Mengshu Sun,Hao Tang,Zhong Jia Xue,Peiyan Dong,Caiwen Ding,Yanzhi Wang,Xue Lin,Zhenman Fang
発行日	2024-07-25 16:35:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー