Vision Transformers on the Edge: A Comprehensive Survey of Model Compression and Acceleration Strategies

要約

近年、視覚変圧器（VITS）は、画像分類、オブジェクト検出、セグメンテーションなどのコンピュータービジョンタスクの強力で有望な技術として浮上しています。
階層的特徴抽出に依存する畳み込みニューラルネットワーク（CNN）とは異なり、Vitsは画像をパッチのシーケンスとして扱い、自己触媒メカニズムを活用します。
ただし、彼らの高い計算の複雑さとメモリは、リソース制約のエッジデバイスでの展開に大きな課題をもたらします。
これらの制限に対処するために、広範な研究はモデル圧縮技術とハードウェアを意識した加速戦略に焦点を当てています。
それにもかかわらず、エッジ展開のための精度、効率、およびハードウェアの適応性におけるこれらの手法とそのトレードオフを体系的に分類する包括的なレビューはまだ不足しています。
この調査では、モデル圧縮技術の構造化された分析、エッジへの推論のためのソフトウェアツール、およびVITSのハードウェア加速戦略を提供することにより、このギャップを橋渡しします。
精度、効率、ハードウェアの適応性への影響について説明し、グラフィックプロセシングユニット（GPU）、アプリケーション固有の統合回路（ASIC）、フィールドプログラム可能なゲートアレイ（FPGA）を含むエッジプラットフォームでのVIT展開を進めるための重要な課題と新たな研究の方向性を強調します。
目標は、エッジデバイスでの効率的な展開のためのVITを最適化するための現代的なガイドでさらなる研究を促すことです。

要約(オリジナル)

In recent years, vision transformers (ViTs) have emerged as powerful and promising techniques for computer vision tasks such as image classification, object detection, and segmentation. Unlike convolutional neural networks (CNNs), which rely on hierarchical feature extraction, ViTs treat images as sequences of patches and leverage self-attention mechanisms. However, their high computational complexity and memory demands pose significant challenges for deployment on resource-constrained edge devices. To address these limitations, extensive research has focused on model compression techniques and hardware-aware acceleration strategies. Nonetheless, a comprehensive review that systematically categorizes these techniques and their trade-offs in accuracy, efficiency, and hardware adaptability for edge deployment remains lacking. This survey bridges this gap by providing a structured analysis of model compression techniques, software tools for inference on edge, and hardware acceleration strategies for ViTs. We discuss their impact on accuracy, efficiency, and hardware adaptability, highlighting key challenges and emerging research directions to advance ViT deployment on edge platforms, including graphics processing units (GPUs), application-specific integrated circuit (ASICs), and field-programmable gate arrays (FPGAs). The goal is to inspire further research with a contemporary guide on optimizing ViTs for efficient deployment on edge devices.

arxiv情報

著者	Shaibal Saha,Lanyu Xu
発行日	2025-04-30 13:55:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Vision Transformers on the Edge: A Comprehensive Survey of Model Compression and Acceleration Strategies

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー