Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?

要約

ニューラルネットワークの学習時に重みをゼロにすることは、推論時の計算量を減らすのに役立つ。学習中に重みの急激な不連続性を引き起こすことなく、ネットワークのスパース比を徐々に増加させるために、我々の研究はソフトスレッショルドとストレートスルー勾配推定を組み合わせ、生の、すなわち、閾値がないバージョンのゼロ化重みを更新するものである。ストレートスルー／ソフトスレッショルド／スパース訓練からST-3と名付けられたこの手法は、1回の訓練サイクルでスパース比を漸増させたとき、精度／スパース比、精度／FLOPSのトレードオフの両面でSoA結果を得ることができます。特に、ST-3はその単純さにもかかわらず、微分可能な定式化や生物にヒントを得たニューロ再生原理を採用した最新の手法と比較して、良好な結果を得ることができました。このことは、効果的なスパース化のための重要な要素は、主に、スパース比を徐々に増加させながら、ゼロ状態にわたって滑らかに進化する自由度を重みに与える能力にあることを示唆している。ソースコードと重みは、https://github.com/vanderschuea/stthree で入手できます。

要約(オリジナル)

Turning the weights to zero when training a neural network helps in reducing the computational complexity at inference. To progressively increase the sparsity ratio in the network without causing sharp weight discontinuities during training, our work combines soft-thresholding and straight-through gradient estimation to update the raw, i.e. non-thresholded, version of zeroed weights. Our method, named ST-3 for straight-through/soft-thresholding/sparse-training, obtains SoA results, both in terms of accuracy/sparsity and accuracy/FLOPS trade-offs, when progressively increasing the sparsity ratio in a single training cycle. In particular, despite its simplicity, ST-3 favorably compares to the most recent methods, adopting differentiable formulations or bio-inspired neuroregeneration principles. This suggests that the key ingredients for effective sparsification primarily lie in the ability to give the weights the freedom to evolve smoothly across the zero state while progressively increasing the sparsity ratio. Source code and weights available at https://github.com/vanderschuea/stthree

arxiv情報

著者	Antoine Vanderschueren,Christophe De Vleeschouwer
発行日	2022-12-02 10:32:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー