Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad

要約

適応手法は、学習率の調整を低コストで行えるため、機械学習において非常に人気があります。
この論文では、KATE という名前の新しい最適化アルゴリズムを紹介します。これは、よく知られている AdaGrad アルゴリズムのスケール不変適応を示します。
一般化線形モデルの場合の KATE のスケール不変性を証明します。
さらに、一般的な滑らかな非凸問題については、KATE の収束率 $O \left(\frac{\log T}{\sqrt{T}} \right)$ を確立し、AdaGrad の最もよく知られた収束率と一致します。
そしてアダム。
また、実際のデータに対する画像分類やテキスト分類などの複雑な機械学習タスクを含む、さまざまな問題を伴う数値実験で、KATE を他の最先端の適応アルゴリズム Adam および AdaGrad と比較します。
結果は、KATE が一貫して AdaGrad を上回り、考慮されたすべてのシナリオで Adam のパフォーマンスと同等または上回ることを示しています。

要約(オリジナル)

Adaptive methods are extremely popular in machine learning as they make learning rate tuning less expensive. This paper introduces a novel optimization algorithm named KATE, which presents a scale-invariant adaptation of the well-known AdaGrad algorithm. We prove the scale-invariance of KATE for the case of Generalized Linear Models. Moreover, for general smooth non-convex problems, we establish a convergence rate of $O \left(\frac{\log T}{\sqrt{T}} \right)$ for KATE, matching the best-known ones for AdaGrad and Adam. We also compare KATE to other state-of-the-art adaptive algorithms Adam and AdaGrad in numerical experiments with different problems, including complex machine learning tasks like image classification and text classification on real data. The results indicate that KATE consistently outperforms AdaGrad and matches/surpasses the performance of Adam in all considered scenarios.

arxiv情報

著者	Sayantan Choudhury,Nazarii Tupitsa,Nicolas Loizou,Samuel Horvath,Martin Takac,Eduard Gorbunov
発行日	2024-06-05 15:13:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー