Efficient Neural Network Training via Subset Pretraining

要約

ニューラルネットワークのトレーニングでは、バッチ (主にトレーニングセットの非常に小さなサブセット) にわたって計算された部分勾配を使用するのが一般的です。
このアプローチは、そのような部分勾配が真の勾配に近く、バッチサイズの平方根によってのみ精度が向上するという議論によって動機づけられています。
理論的な正当性は、確率的近似理論の助けを借りて行われます。
しかし、この理論が妥当であるための条件は、通常の学習率スケジュールでは満たされません。
バッチ処理は、効率的な 2 次最適化手法と組み合わせるのも困難です。
この提案は、別の仮説に基づいています。つまり、トレーニングセットの損失の最小値は、そのサブセットの最小値によってよく近似されることが期待できます。
このようなサブセットの最小値は、トレーニングセット全体の最適化に必要な時間のほんの一部で計算できます。
この仮説は、MNIST、CIFAR-10、および CIFAR-100 の画像分類ベンチマークを利用してテストされており、オプションでトレーニングデータの拡張によって拡張されています。
実験により、従来のトレーニングと同等の成果が得られることを確認しました。
要約すると、特定のモデルパラメーターセットの過剰決定率が 1 を十分に超える場合は、小さなサブセットでも代表的になります。
計算コストを 10 分の 1 以下に削減できます。

要約(オリジナル)

In training neural networks, it is common practice to use partial gradients computed over batches, mostly very small subsets of the training set. This approach is motivated by the argument that such a partial gradient is close to the true one, with precision growing only with the square root of the batch size. A theoretical justification is with the help of stochastic approximation theory. However, the conditions for the validity of this theory are not satisfied in the usual learning rate schedules. Batch processing is also difficult to combine with efficient second-order optimization methods. This proposal is based on another hypothesis: the loss minimum of the training set can be expected to be well-approximated by the minima of its subsets. Such subset minima can be computed in a fraction of the time necessary for optimizing over the whole training set. This hypothesis has been tested with the help of the MNIST, CIFAR-10, and CIFAR-100 image classification benchmarks, optionally extended by training data augmentation. The experiments have confirmed that results equivalent to conventional training can be reached. In summary, even small subsets are representative if the overdetermination ratio for the given model parameter set sufficiently exceeds unity. The computing expense can be reduced to a tenth or less.

arxiv情報

著者	Jan Spörer,Bernhard Bermeitinger,Tomas Hrycej,Niklas Limacher,Siegfried Handschuh
発行日	2024-10-29 14:18:51+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Efficient Neural Network Training via Subset Pretraining

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー