Balance is Essence: Accelerating Sparse Training via Adaptive Gradient Correction

要約

ディープニューラルネットワークは優れたパフォーマンスにもかかわらず、大量のメモリと計算コストを必要とするため、リソースに制約のあるシナリオでは適用できません。
スパーストレーニングは、これらのコストを削減するための最も一般的な手法の 1 つですが、スパース制約により最適化がさらに困難になり、トレーニング時間が増加し、不安定性が生じます。
この研究では、この問題を克服し、時空間効率を達成することを目指しています。
スパーストレーニングの収束を加速し安定させるために、勾配の変化を分析し、適応勾配補正手法を開発します。
具体的には、現在の勾配と以前の勾配の間の相関関係を近似し、これを使用して 2 つの勾配のバランスをとり、補正された勾配を取得します。
私たちの方法は、標準セットアップと敵対的セットアップの両方で、最も一般的なスパーストレーニングパイプラインで使用できます。
理論的には、私たちの方法がスパーストレーニングの収束速度を加速できることを証明します。
複数のデータセット、モデルアーキテクチャ、およびスパース性に関する広範な実験により、同じ数のトレーニングエポックを与えた場合、私たちの手法が主要なスパーストレーニング手法よりも精度が最大 \textbf{5.0\%} 優れており、トレーニングエポックの数が最大で削減されることが実証されました。
同じ精度を達成するには \textbf{52.1\%} を使用します。
私たちのコードは \url{https://github.com/StevenBoys/AGENT} で入手できます。

要約(オリジナル)

Despite impressive performance, deep neural networks require significant memory and computation costs, prohibiting their application in resource-constrained scenarios. Sparse training is one of the most common techniques to reduce these costs, however, the sparsity constraints add difficulty to the optimization, resulting in an increase in training time and instability. In this work, we aim to overcome this problem and achieve space-time co-efficiency. To accelerate and stabilize the convergence of sparse training, we analyze the gradient changes and develop an adaptive gradient correction method. Specifically, we approximate the correlation between the current and previous gradients, which is used to balance the two gradients to obtain a corrected gradient. Our method can be used with the most popular sparse training pipelines under both standard and adversarial setups. Theoretically, we prove that our method can accelerate the convergence rate of sparse training. Extensive experiments on multiple datasets, model architectures, and sparsities demonstrate that our method outperforms leading sparse training methods by up to \textbf{5.0\%} in accuracy given the same number of training epochs, and reduces the number of training epochs by up to \textbf{52.1\%} to achieve the same accuracy. Our code is available on: \url{https://github.com/StevenBoys/AGENT}.

arxiv情報

著者	Bowen Lei,Dongkuan Xu,Ruqi Zhang,Shuren He,Bani K. Mallick
発行日	2023-12-05 16:05:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Balance is Essence: Accelerating Sparse Training via Adaptive Gradient Correction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー