Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic Programming

要約

ニューラルネットワークの刈り込みに関する最近の研究では、ネットワークの深さを減らすことは、チャンネル刈り込みによってネットワークの幅を減らすよりも、実行時のメモリ使用量を減らし、推論レイテンシを加速させるのに効果的であると提唱されています。この点に関して、最近の研究では、畳み込み層を統合する深さ圧縮アルゴリズムが提案されています。しかし、既存のアルゴリズムは、探索空間が狭く、人為的なヒューリスティックに依存している。本論文では、一般的な畳み込み演算を対象とした、新しい深度圧縮アルゴリズムを提案する。非効率な活性化層を同一性関数で置き換え、連続する畳み込み演算を浅い等価な畳み込み演算に最適にマージし、エンドツーエンドの推論待ち時間を効率化するサブセット選択問題を提案している。提案するサブセット選択問題はNP困難であるため、我々は、数秒以内に2段階の動的計画法によって正確に解くことができる代理最適化問題を定式化する。TensorRTを用いて、我々の手法とベースラインを評価し、公平な推論レイテンシの比較を行う。我々の方法は、ImageNetデータセットのMobileNetV2において、より高い精度とより速い推論速度でベースライン法を凌駕する。具体的には、MobileNetV2-1.0 on the ImageNetにおいて、$1.41times$の速度向上と$0.11$の精度向上を達成した。

要約(オリジナル)

Recent works on neural network pruning advocate that reducing the depth of the network is more effective in reducing run-time memory usage and accelerating inference latency than reducing the width of the network through channel pruning. In this regard, some recent works propose depth compression algorithms that merge convolution layers. However, the existing algorithms have a constricted search space and rely on human-engineered heuristics. In this paper, we propose a novel depth compression algorithm which targets general convolution operations. We propose a subset selection problem that replaces inefficient activation layers with identity functions and optimally merges consecutive convolution operations into shallow equivalent convolution operations for efficient end-to-end inference latency. Since the proposed subset selection problem is NP-hard, we formulate a surrogate optimization problem that can be solved exactly via two-stage dynamic programming within a few seconds. We evaluate our methods and baselines by TensorRT for a fair inference latency comparison. Our method outperforms the baseline method with higher accuracy and faster inference speed in MobileNetV2 on the ImageNet dataset. Specifically, we achieve $1.41\times$ speed-up with $0.11$\%p accuracy gain in MobileNetV2-1.0 on the ImageNet.

arxiv情報

著者	Jinuk Kim,Yeonwoo Jeong,Deokjae Lee,Hyun Oh Song
発行日	2023-06-02 15:46:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic Programming

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー