Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference

要約

整数、固定小数点、または浮動小数点のデータ型を使用する従来のディープニューラルネットワーク (DNN) 量子化手法では、低精度で多様な DNN パラメータ分布を捕捉するのが難しく、多くの場合、大きなシリコンオーバーヘッドと量子化を意識した集中的なトレーニングが必要になります。
この研究では、対数ポジット (LP) を導入します。これは、LP ビットフィールドをパラメータ化することで DNN の重み/アクティベーション分布に動的に適応する、ポジットからインスピレーションを得た、適応的でハードウェアに適したデータ型です。
また、新しい遺伝的アルゴリズムベースのフレームワークである LP 量子化 (LPQ) を開発し、新しいグローバル-ローカル対比目標を通じて量子化モデルと完全精度モデルの間の表現の相違を低減しながら、最適な層ごとの LP パラメーターを見つけます。
さらに、計算データパスに LP を組み込んだ処理要素 (PE) で構成される統合混合精度 LP アクセラレータ (LPA) アーキテクチャを設計します。
当社のアルゴリズムとハードウェアの共同設計では、さまざまな CNN および ViT モデルにわたってトップ 1 の精度が平均 1% 未満で低下することが実証されています。
また、さまざまなデータ型を使用する最先端の量子化アクセラレータと比較して、単位面積あたりのパフォーマンスが最大 2 倍向上し、エネルギー効率が 2.2 倍向上します。

要約(オリジナル)

Traditional Deep Neural Network (DNN) quantization methods using integer, fixed-point, or floating-point data types struggle to capture diverse DNN parameter distributions at low precision, and often require large silicon overhead and intensive quantization-aware training. In this study, we introduce Logarithmic Posits (LP), an adaptive, hardware-friendly data type inspired by posits that dynamically adapts to DNN weight/activation distributions by parameterizing LP bit fields. We also develop a novel genetic-algorithm based framework, LP Quantization (LPQ), to find optimal layer-wise LP parameters while reducing representational divergence between quantized and full-precision models through a novel global-local contrastive objective. Additionally, we design a unified mixed-precision LP accelerator (LPA) architecture comprising of processing elements (PEs) incorporating LP in the computational datapath. Our algorithm-hardware co-design demonstrates on average <1% drop in top-1 accuracy across various CNN and ViT models. It also achieves ~ 2x improvements in performance per unit area and 2.2x gains in energy efficiency compared to state-of-the-art quantization accelerators using different data types.

arxiv情報

著者	Akshat Ramachandran,Zishen Wan,Geonhwa Jeong,John Gustafson,Tushar Krishna
発行日	2024-03-08 17:28:49+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー