Sensitivity-Aware Mixed-Precision Quantization and Width Optimization of Deep Neural Networks Through Cluster-Based Tree-Structured Parzen Estimation

要約

深層学習モデルの複雑さと計算需要が高まるにつれて、ニューラルネットワーク設計の効果的な最適化方法の必要性が最も重要になっています。
この研究では、個々のニューラルネットワーク層に最適なビット幅と層幅を自動的に選択する革新的な検索メカニズムが導入されています。
これにより、ディープニューラルネットワークの効率が大幅に向上します。
検索ドメインは、ヘッセ行列ベースの枝刈りを活用して戦略的に削減され、重要ではないパラメータが確実に削除されます。
続いて、クラスターベースのツリー構造のパルゼン推定器を使用することによる、有利な結果と不利な結果の代理モデルの開発について詳しく説明します。
この戦略により、アーキテクチャの可能性を合理的に探索し、最高のパフォーマンスを発揮する設計を迅速に特定することが可能になります。
既知のデータセットに対する厳密なテストを通じて、私たちの方法が既存の方法に比べて明確な利点を証明しています。
主要な圧縮戦略と比較して、当社のアプローチは、精度を損なうことなくモデルサイズを 20% 削減するという驚異的な結果を記録しました。
さらに、私たちの方法は、現在利用可能な検索に重点を置いた最良の戦略と比較して、検索時間を 12 倍削減できます。
その結果、私たちが提案した手法は、ニューラルネットワーク設計の最適化における飛躍的な進歩を表し、リソースが限られた環境での迅速なモデル設計と実装への道を開き、それによってスケーラブルな深層学習ソリューションの可能性を推進します。

要約(オリジナル)

As the complexity and computational demands of deep learning models rise, the need for effective optimization methods for neural network designs becomes paramount. This work introduces an innovative search mechanism for automatically selecting the best bit-width and layer-width for individual neural network layers. This leads to a marked enhancement in deep neural network efficiency. The search domain is strategically reduced by leveraging Hessian-based pruning, ensuring the removal of non-crucial parameters. Subsequently, we detail the development of surrogate models for favorable and unfavorable outcomes by employing a cluster-based tree-structured Parzen estimator. This strategy allows for a streamlined exploration of architectural possibilities and swift pinpointing of top-performing designs. Through rigorous testing on well-known datasets, our method proves its distinct advantage over existing methods. Compared to leading compression strategies, our approach records an impressive 20% decrease in model size without compromising accuracy. Additionally, our method boasts a 12x reduction in search time relative to the best search-focused strategies currently available. As a result, our proposed method represents a leap forward in neural network design optimization, paving the way for quick model design and implementation in settings with limited resources, thereby propelling the potential of scalable deep learning solutions.

arxiv情報

著者	Seyedarmin Azizi,Mahdi Nazemi,Arash Fayyazi,Massoud Pedram
発行日	2023-08-16 16:18:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Sensitivity-Aware Mixed-Precision Quantization and Width Optimization of Deep Neural Networks Through Cluster-Based Tree-Structured Parzen Estimation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー