Gauss-Newton Dynamics for Neural Networks: A Riemannian Optimization Perspective

要約

滑らかな活性化関数を使用してニューラルネットワークをトレーニングするためのガウスニュートン力学の収束を分析します。
過小パラメータ化領域では、ガウス – ニュートン勾配流は、ユークリッド出力空間の低次元の滑らかな埋め込み部分多様体上にリーマン勾配流を誘発します。
リーマン最適化のツールを使用して、グラム行列の条件付けから独立した \emph{指数関数的レート} で最適なクラス内予測子へのリーマン勾配流の \emph{last-iterate} 収束を証明します。
明示的な正規化を必要としない}。
さらに、ニューラルネットワークのスケーリング係数と初期化が収束動作に与える重大な影響を特徴付けます。
過剰パラメータ化領域では、適切に選択された減衰係数によるレーベンバーグ・マルカート力学が、過小パラメータ化領域と同様に、条件の悪いカーネルに対してロバスト性をもたらすことを示します。
これらの発見は、特にカーネル行列とグラム行列が小さな特異値を持つ悪条件問題において、ニューラルネットワークを効率的に最適化するためのガウスニュートン法の可能性を示しています。

要約(オリジナル)

We analyze the convergence of Gauss-Newton dynamics for training neural networks with smooth activation functions. In the underparameterized regime, the Gauss-Newton gradient flow induces a Riemannian gradient flow on a low-dimensional, smooth, embedded submanifold of the Euclidean output space. Using tools from Riemannian optimization, we prove \emph{last-iterate} convergence of the Riemannian gradient flow to the optimal in-class predictor at an \emph{exponential rate} that is independent of the conditioning of the Gram matrix, \emph{without} requiring explicit regularization. We further characterize the critical impacts of the neural network scaling factor and the initialization on the convergence behavior. In the overparameterized regime, we show that the Levenberg-Marquardt dynamics with an appropriately chosen damping factor yields robustness to ill-conditioned kernels, analogous to the underparameterized regime. These findings demonstrate the potential of Gauss-Newton methods for efficiently optimizing neural networks, particularly in ill-conditioned problems where kernel and Gram matrices have small singular values.

arxiv情報

著者	Semih Cayci
発行日	2024-12-20 15:58:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Gauss-Newton Dynamics for Neural Networks: A Riemannian Optimization Perspective

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー