Balance is Essence: Accelerating Sparse Training via Adaptive Gradient Correction

要約

ディープニューラルネットワークは、様々なタスクで優れた性能を発揮するものの、メモリと計算コストが大きく、リソースに制約のあるシナリオでの応用が禁じられています。スパース訓練は、これらのコストを削減するための最も一般的な技術の一つですが、スパース性制約が最適化に困難をもたらし、訓練時間の増加や不安定性をもたらします。本研究では、この問題を克服し、時空間協調効率を達成することを目指す。スパース学習の収束を早め安定化させるため、勾配変化を解析し、適応的な勾配補正法を開発する。具体的には、現在の勾配と以前の勾配の相関を近似的に求め、それを用いて2つの勾配のバランスをとり、補正された勾配を得るというものである。本手法は、一般的なスパース学習パイプラインで、標準的な設定と敵対的な設定の両方で使用することが可能である。理論的には、本手法がスパース学習の収束率を向上させることができることを証明する。また、同じ学習エポック数であれば、学習エポック数を最大で୧⃛(๑⃙⃘◡̈๑⃙⃘)୨⃛減らすことができる。

要約(オリジナル)

Despite impressive performance on a wide variety of tasks, deep neural networks require significant memory and computation costs, prohibiting their application in resource-constrained scenarios. Sparse training is one of the most common techniques to reduce these costs, however, the sparsity constraints add difficulty to the optimization, resulting in an increase in training time and instability. In this work, we aim to overcome this problem and achieve space-time co-efficiency. To accelerate and stabilize the convergence of sparse training, we analyze the gradient changes and develop an adaptive gradient correction method. Specifically, we approximate the correlation between the current and previous gradients, which is used to balance the two gradients to obtain a corrected gradient. Our method can be used with most popular sparse training pipelines under both standard and adversarial setups. Theoretically, we prove that our method can accelerate the convergence rate of sparse training. Extensive experiments on multiple datasets, model architectures, and sparsities demonstrate that our method outperforms leading sparse training methods by up to \textbf{5.0\%} in accuracy given the same number of training epochs, and reduces the number of training epochs by up to \textbf{52.1\%} to achieve the same accuracy.

arxiv情報

著者	Bowen Lei,Dongkuan Xu,Ruqi Zhang,Shuren He,Bani K. Mallick
発行日	2023-01-09 18:50:03+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Balance is Essence: Accelerating Sparse Training via Adaptive Gradient Correction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー