When, Where and Why to Average Weights?

要約

トレーニング軌道に沿った平均チェックポイントは、機械学習モデルの一般化パフォーマンスを改善し、トレーニング時間を短縮するためのシンプルで強力なアプローチです。
これらの潜在的な利益に動機付けられ、この手法を公正かつ徹底的にベンチマークするために、最新の深い学習における平均化技術の広範な評価を提示します。これは、algoperf \ citep {dahl_benchmarking_2023}を使用して実行します。
。
最近の文献で示唆されているように、体重平均がトレーニング時間を短縮し、一般化を改善し、学習率の崩壊を置き換えることができるかどうかを調査します。
7つのアーキテクチャとデータセットにわたる評価により、平均化はトレーニングを大幅に加速し、最小限の実装とメモリコストの価格でかなりの効率向上をもたらし、考慮されたすべてのワークロードの一般化を軽度に改善します。
最後に、平均化レートアニーリングと学習レートのアニーリングの関係を調査し、2つを最適に組み合わせて最高のパフォーマンスを実現する方法を示します。

要約(オリジナル)

Averaging checkpoints along the training trajectory is a simple yet powerful approach to improve the generalization performance of Machine Learning models and reduce training time. Motivated by these potential gains, and in an effort to fairly and thoroughly benchmark this technique, we present an extensive evaluation of averaging techniques in modern Deep Learning, which we perform using AlgoPerf \citep{dahl_benchmarking_2023}, a large-scale benchmark for optimization algorithms. We investigate whether weight averaging can reduce training time, improve generalization, and replace learning rate decay, as suggested by recent literature. Our evaluation across seven architectures and datasets reveals that averaging significantly accelerates training and yields considerable efficiency gains, at the price of a minimal implementation and memory cost, while mildly improving generalization across all considered workloads. Finally, we explore the relationship between averaging and learning rate annealing and show how to optimally combine the two to achieve the best performances.

arxiv情報

著者	Niccolò Ajroldi,Antonio Orvieto,Jonas Geiping
発行日	2025-02-10 18:40:48+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

When, Where and Why to Average Weights?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー