Archtree: on-the-fly tree-structured exploration for latency-aware pruning of deep neural networks

要約

ディープニューラルネットワーク (DNN) は、特にコンピュータービジョンにおける多くの問題に対処するために広く普及しています。
ただし、DNN 推論は計算量が多く、法外な計算になる可能性があります。
エッジデバイスを検討する場合。
この問題を解決するための一般的な解決策は、DNN プルーニング、さらには構造化プルーニングです。この場合、コヒーレントな計算ブロック (畳み込みネットワークのチャネルなど) が削除されます。プルーニングされたサブモデルの空間の徹底的な検索は実際には困難であるため、チャネル
通常、重要度推定ヒューリスティックに基づいて繰り返し削除されます。
最近、有望なレイテンシを意識したプルーニング方法が提案されました。この方法では、ネットワークが特定のハードウェアで事前に推定された実時間レイテンシの目標バジェットに達するまでチャネルが削除されます。
この論文では、DNN のレイテンシ駆動型の構造化プルーニングのための新しい方法である Archtree を紹介します。
Archtree は、枝刈りされた複数のサブモデルの候補をツリー状に並行して探索し、探索空間をより適切に探索できるようにします。
さらに、ターゲットハードウェアでのオンザフライレイテンシ推定が含まれ、指定された予算と比較してより近いレイテンシが考慮されます。
いくつかの DNN アーキテクチャとターゲットハードウェアでの実証結果は、Archtree が既存の最先端の手法と比較して、レイテンシバジェットをよりよく適合させながら、元のモデルの精度をよりよく維持することを示しています。

要約(オリジナル)

Deep neural networks (DNNs) have become ubiquitous in addressing a number of problems, particularly in computer vision. However, DNN inference is computationally intensive, which can be prohibitive e.g. when considering edge devices. To solve this problem, a popular solution is DNN pruning, and more so structured pruning, where coherent computational blocks (e.g. channels for convolutional networks) are removed: as an exhaustive search of the space of pruned sub-models is intractable in practice, channels are typically removed iteratively based on an importance estimation heuristic. Recently, promising latency-aware pruning methods were proposed, where channels are removed until the network reaches a target budget of wall-clock latency pre-emptively estimated on specific hardware. In this paper, we present Archtree, a novel method for latency-driven structured pruning of DNNs. Archtree explores multiple candidate pruned sub-models in parallel in a tree-like fashion, allowing for a better exploration of the search space. Furthermore, it involves on-the-fly latency estimation on the target hardware, accounting for closer latencies as compared to the specified budget. Empirical results on several DNN architectures and target hardware show that Archtree better preserves the original model accuracy while better fitting the latency budget as compared to existing state-of-the-art methods.

arxiv情報

著者	Rémi Ouazan Reboul,Edouard Yvinec,Arnaud Dapogny,Kevin Bailly
発行日	2023-11-17 14:24:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Archtree: on-the-fly tree-structured exploration for latency-aware pruning of deep neural networks

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー