Hidden Minima in Two-Layer ReLU Networks


$d$ ごとに 1 つの最小値を与える 2 種類の偽最小値の無限族が最近発見されました。
最初のタイプに属する最小値での損失は、$d$ が増加するにつれてゼロに収束します。
2 番目のタイプでは、損失はゼロから離れた範囲にとどまります。
既存の解析によると、両方のタイプのヘッセ行列スペクトルは $O(d^{-1/2})$-terms を法として一致しますが、有望ではありません。
$S_d$ の部分群の作用に対して不変な部分空間の配置に関する、明らかにかけ離れた群表現理論的考察が、作用によって固定されたものと比較して、すべての有限空間の正確な記述を生み出すことを証明します。
損失関数に使用された一般的な結果は、隠れた極小値から発せられる弧は、正確には以前の研究には存在しなかった $O(d^{-1/2})$-eigenvalue 項のせいで、その構造と対称性が特徴的に異なることを明らかにします。


The optimization problem associated to fitting two-layer ReLU networks having $d$~inputs, $k$~neurons, and labels generated by a target network, is considered. Two types of infinite families of spurious minima, giving one minimum per $d$, were recently found. The loss at minima belonging to the first type converges to zero as $d$ increases. In the second type, the loss remains bounded away from zero. That being so, how may one avoid minima belonging to the latter type? Fortunately, such minima are never detected by standard optimization methods. Motivated by questions concerning the nature of this phenomenon, we develop methods to study distinctive analytic properties of hidden minima. By existing analyses, the Hessian spectrum of both types agree modulo $O(d^{-1/2})$-terms — not promising. Thus, rather, our investigation proceeds by studying curves along which the loss is minimized or maximized, generally referred to as tangency arcs. We prove that apparently far removed group representation-theoretic considerations concerning the arrangement of subspaces invariant to the action of subgroups of $S_d$, the symmetry group over $d$ symbols, relative to ones fixed by the action yield a precise description of all finitely many admissible types of tangency arcs. The general results used for the loss function reveal that arcs emanating from hidden minima differ, characteristically, by their structure and symmetry, precisely on account of the $O(d^{-1/2})$-eigenvalue terms absent in previous work, indicating in particular the subtlety of the analysis. The theoretical results, stated and proved for o-minimal structures, show that the set comprising all tangency arcs is topologically sufficiently tame to enable a numerical construction of tangency arcs and so compare how minima, both types, are positioned relative to adjacent critical points.


著者 Yossi Arjevani
発行日 2024-02-19 17:33:41+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.LG, math.OC, stat.ML パーマリンク