Dynamic Sparse Training via More Exploration

要約

ディープニューラルネットワーク (DNN) の過剰パラメーター化により、多くのアプリケーションで高い予測精度が示されています。
効果的ではありますが、パラメーターの数が多いため、リソースが限られたデバイスでの人気が妨げられ、環境への影響が非常に大きくなります。
スパーストレーニング (反復ごとにゼロ以外の固定数の重みを使用) は、モデルサイズを縮小することでトレーニングコストを大幅に軽減できます。
ただし、既存のスパーストレーニング方法は、主にランダムベースまたは貪欲ベースのドロップアンドグロー戦略のいずれかを使用するため、局所的な最小精度と低い精度が得られます。
この作業では、動的スパーストレーニングをスパースコネクティビティ検索問題と見なし、ローカルオプティマポイントとサドルポイントから逃れるための活用と探索の取得関数を設計します。
さらに取得関数を設計し、提案手法の理論的保証を提供し、その収束特性を明らかにします。
実験結果は、提案された方法によって得られたスパースモデル (最大 98\% スパース性) が、さまざまなディープラーニングタスクで SOTA スパーストレーニングメソッドよりも優れていることを示しています。
VGG-19 / CIFAR-100、ResNet-50 / CIFAR-10、ResNet-50 / CIFAR-100 では、高密度モデルよりも精度が高くなります。
ResNet-50 / ImageNet では、提案された方法は SOTA スパーストレーニング方法と比較して最大 8.2\% 精度が向上します。

要約(オリジナル)

Over-parameterization of deep neural networks (DNNs) has shown high prediction accuracy for many applications. Although effective, the large number of parameters hinders its popularity on resource-limited devices and has an outsize environmental impact. Sparse training (using a fixed number of nonzero weights in each iteration) could significantly mitigate the training costs by reducing the model size. However, existing sparse training methods mainly use either random-based or greedy-based drop-and-grow strategies, resulting in local minimal and low accuracy. In this work, we consider the dynamic sparse training as a sparse connectivity search problem and design an exploitation and exploration acquisition function to escape from local optima and saddle points. We further design an acquisition function and provide the theoretical guarantees for the proposed method and clarify its convergence property. Experimental results show that sparse models (up to 98\% sparsity) obtained by our proposed method outperform the SOTA sparse training methods on a wide variety of deep learning tasks. On VGG-19 / CIFAR-100, ResNet-50 / CIFAR-10, ResNet-50 / CIFAR-100, our method has even higher accuracy than dense models. On ResNet-50 / ImageNet, the proposed method has up to 8.2\% accuracy improvement compared to SOTA sparse training methods.

arxiv情報

著者	Shaoyi Huang,Bowen Lei,Dongkuan Xu,Hongwu Peng,Yue Sun,Mimi Xie,Caiwen Ding
発行日	2022-12-14 18:09:41+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Dynamic Sparse Training via More Exploration

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー