Accumulator-Aware Post-Training Quantization

要約

最近のいくつかの研究では、低精度の蓄積を調査し、さまざまなプラットフォームにわたるスループット、電力、エリアの改善が報告されています。
ただし、付属の提案では、量子化対応トレーニング (QAT) パラダイムのみが考慮されており、ループ内で量子化を使用してモデルが微調整またはゼロからトレーニングされます。
モデルのサイズが増大し続けるにつれて、QAT 技術はますます高価になり、これが最近のポストトレーニング量子化 (PTQ) 研究の急増の動機となっています。
私たちの知る限り、私たちの研究は、PTQ 設定におけるアキュムレータを意識した量子化の最初の正式な研究となります。
このギャップを埋めるために、既存の層ごとの PTQ アルゴリズムにオーバーフロー回避保証を与えるように設計されたアキュムレータ対応拡張機能の実用的なフレームワークである AXE を導入します。
私たちは理論的に AX を動機付け、GPFQ と OPTQ という 2 つの最先端の PTQ アルゴリズムの上に AX を実装することでその柔軟性を実証します。
さらに AX を一般化して、マルチステージ累積を初めてサポートし、完全なデータパスの最適化と大規模言語モデル (LLM) へのスケーリングへの扉を開きます。
画像分類モデルと言語生成モデル全体で AX を評価し、アキュムレータのビット幅とモデルの精度の間のトレードオフがベースライン手法と比べて大幅に向上していることを観察しました。

要約(オリジナル)

Several recent studies have investigated low-precision accumulation, reporting improvements in throughput, power, and area across various platforms. However, the accompanying proposals have only considered the quantization-aware training (QAT) paradigm, in which models are fine-tuned or trained from scratch with quantization in the loop. As models continue to grow in size, QAT techniques become increasingly more expensive, which has motivated the recent surge in post-training quantization (PTQ) research. To the best of our knowledge, ours marks the first formal study of accumulator-aware quantization in the PTQ setting. To bridge this gap, we introduce AXE, a practical framework of accumulator-aware extensions designed to endow overflow avoidance guarantees to existing layer-wise PTQ algorithms. We theoretically motivate AXE and demonstrate its flexibility by implementing it on top of two state-of-the-art PTQ algorithms: GPFQ and OPTQ. We further generalize AXE to support multi-stage accumulation for the first time, opening the door for full datapath optimization and scaling to large language models (LLMs). We evaluate AXE across image classification and language generation models, and observe significant improvements in the trade-off between accumulator bit width and model accuracy over baseline methods.

arxiv情報

著者	Ian Colbert,Fabian Grob,Giuseppe Franco,Jinjie Zhang,Rayan Saab
発行日	2024-09-25 16:58:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Accumulator-Aware Post-Training Quantization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー