Overshoot: Taking advantage of future gradients in momentum-based stochastic optimization

要約

オーバーシュートは、標準およびネステロフの運動量を超えてパフォーマンスを向上させるように設計された、運動量ベースの新しい確率的勾配降下最適化手法です。
従来の運動量法では、ステップを実行してモデルを更新する前に、前のステップからの勾配が現在のモデルの重みでの勾配と集約されます。
オーバーシュートは、現在のモデルの重みで勾配を計算するのではなく、現在の運動量の方向にシフトしたモデルの重みで勾配を計算します。
これにより、勾配を使用する直接の利点が犠牲になります。
ある点での評価を優先して、正確なモデルの重みを設定します。これは、将来の更新により関連性が高くなる可能性があります。
この原則を運動量ベースのオプティマイザー (運動量と Adam を使用した SGD) に組み込むと、収束が高速化される (平均して少なくとも 15% のステップが節約される) ことを示します。
Overshoot は、幅広いタスクにわたって標準と Nesterov のモメンタムの両方を常に上回っており、メモリがゼロで計算オーバーヘッドが小さい一般的なモメンタムベースのオプティマイザに統合されます。

要約(オリジナル)

Overshoot is a novel, momentum-based stochastic gradient descent optimization method designed to enhance performance beyond standard and Nesterov’s momentum. In conventional momentum methods, gradients from previous steps are aggregated with the gradient at current model weights before taking a step and updating the model. Rather than calculating gradient at the current model weights, Overshoot calculates the gradient at model weights shifted in the direction of the current momentum. This sacrifices the immediate benefit of using the gradient w.r.t. the exact model weights now, in favor of evaluating at a point, which will likely be more relevant for future updates. We show that incorporating this principle into momentum-based optimizers (SGD with momentum and Adam) results in faster convergence (saving on average at least 15% of steps). Overshoot consistently outperforms both standard and Nesterov’s momentum across a wide range of tasks and integrates into popular momentum-based optimizers with zero memory and small computational overhead.

arxiv情報

著者	Jakub Kopal,Michal Gregor,Santiago de Leon-Martinez,Jakub Simko
発行日	2025-01-16 14:18:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Overshoot: Taking advantage of future gradients in momentum-based stochastic optimization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー