Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence

要約

この研究は、相関損失を使用して高次元の標準ガウスデータのマルチインデックス関数を近似するニューラルネットワークモデルの勾配流れダイナミクスに焦点を当てています。
具体的には、私たちが考えるマルチインデックス関数はニューロンの合計 $f^*(x) \!=\! です。
\sum_{j=1}^k \!
\sigma^*(v_j^T x)$ ここで、$v_1、\dots、v_k$ は単位ベクトルで、$\sigma^*$ にはエルミート展開の 1 番目と 2 番目のエルミート多項式がありません。
単一インデックスの場合 ($k\!=\!1$) の場合、検索フェーズを克服するには多項式の時間計算量が必要であることが知られています。
まず、この結果を任意の方向のベクトルによって特徴付けられるマルチインデックス関数に一般化します。
探索フェーズの後、ネットワークニューロンがインデックスベクトルに収束するのか、それとも次善の解決策で行き詰まるのかは不明です。
インデックスベクトルが直交している場合、固定点の完全な特徴付けが行われ、ニューロンが最も近いインデックスベクトルに収束することが証明されます。
したがって、$n \! を使用します。
\無症状\!
k \log k$ ニューロンは、ランダムな初期化よりも高い確率で勾配フローを使用してインデックスベクトルの完全なセットを見つけることを保証します。
$ v_i^T v_j \!=\! の場合
\ベータ\!
\geq\!
すべての $i \neq j$ が 0$ であるため、明確なしきい値 $\beta_c \!=\! が存在することが証明されます。
c/(c+k)$ は、インデックスベクトルの平均を計算する固定点が鞍点から最小値に遷移する位置です。
数値シミュレーションによると、相関損失と軽度のオーバーパラメータ化を使用すると、インデックスベクトルがほぼ直交している場合にはすべてのインデックスベクトルを学習するのに十分ですが、インデックスベクトル間のドット積が特定のしきい値を超えると相関損失が失敗することがわかります。

要約(オリジナル)

This work focuses on the gradient flow dynamics of a neural network model that uses correlation loss to approximate a multi-index function on high-dimensional standard Gaussian data. Specifically, the multi-index function we consider is a sum of neurons $f^*(x) \!=\! \sum_{j=1}^k \! \sigma^*(v_j^T x)$ where $v_1, \dots, v_k$ are unit vectors, and $\sigma^*$ lacks the first and second Hermite polynomials in its Hermite expansion. It is known that, for the single-index case ($k\!=\!1$), overcoming the search phase requires polynomial time complexity. We first generalize this result to multi-index functions characterized by vectors in arbitrary directions. After the search phase, it is not clear whether the network neurons converge to the index vectors, or get stuck at a sub-optimal solution. When the index vectors are orthogonal, we give a complete characterization of the fixed points and prove that neurons converge to the nearest index vectors. Therefore, using $n \! \asymp \! k \log k$ neurons ensures finding the full set of index vectors with gradient flow with high probability over random initialization. When $ v_i^T v_j \!=\! \beta \! \geq \! 0$ for all $i \neq j$, we prove the existence of a sharp threshold $\beta_c \!=\! c/(c+k)$ at which the fixed point that computes the average of the index vectors transitions from a saddle point to a minimum. Numerical simulations show that using a correlation loss and a mild overparameterization suffices to learn all of the index vectors when they are nearly orthogonal, however, the correlation loss fails when the dot product between the index vectors exceeds a certain threshold.

arxiv情報

著者	Berfin Simsek,Amire Bendjeddou,Daniel Hsu
発行日	2024-11-13 17:25:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー