SAD Neural Networks: Divergent Gradient Flows and Asymptotic Optimality via o-minimal Structures

要約

私たちは、ロジスティック、双曲線接線、ソフトプラス、またはGELU関数などの一般的に使用される連続的に微分可能な活性化関数を使用して、完全に接続されたフィードフォワードニューラルネットワークの損失景観の勾配フローを研究します。
勾配の流れが臨界点に収束するか、無限に分岐する一方で、損失が漸近臨界値に収束することを証明します。
さらに、最適レベルを超える最大$ \ varepsilon $で初期化された勾配フローの損失値が収束するように、しきい値$ \ varepsilon> 0 $の存在を証明します。
多項式ターゲット機能と十分に大きなアーキテクチャとデータセットの場合、最適な損失値がゼロであり、漸近的にのみ実現できることを証明します。
この設定から、私たちは、十分に良好な初期化を伴う勾配の流れが無限に分岐することを推定します。
私たちの証拠は、Oミニマル構造のジオメトリに大きく依存しています。
これらの理論的発見を数値実験で確認し、調査を実世界のシナリオに拡張し、類似の動作を観察します。

要約(オリジナル)

We study gradient flows for loss landscapes of fully connected feed forward neural networks with commonly used continuously differentiable activation functions such as the logistic, hyperbolic tangent, softplus or GELU function. We prove that the gradient flow either converges to a critical point or diverges to infinity while the loss converges to an asymptotic critical value. Moreover, we prove the existence of a threshold $\varepsilon>0$ such that the loss value of any gradient flow initialized at most $\varepsilon$ above the optimal level converges to it. For polynomial target functions and sufficiently big architecture and data set, we prove that the optimal loss value is zero and can only be realized asymptotically. From this setting, we deduce our main result that any gradient flow with sufficiently good initialization diverges to infinity. Our proof heavily relies on the geometry of o-minimal structures. We confirm these theoretical findings with numerical experiments and extend our investigation to real-world scenarios, where we observe an analogous behavior.

arxiv情報

著者	Julian Kranz,Davide Gallon,Steffen Dereich,Arnulf Jentzen
発行日	2025-05-14 17:15:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

SAD Neural Networks: Divergent Gradient Flows and Asymptotic Optimality via o-minimal Structures

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー