Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction

要約

平均場ランジュバンダイナミクス (MFLD) は、分布依存のドリフトを組み込んだランジュバンダイナミクスの非線形一般化であり、(ノイズを含む) 勾配降下による 2 層ニューラルネットワークの最適化から自然に生じます。
最近の研究では、MFLD がメジャー空間におけるエントロピー正則化凸関数をグローバルに最小化することが示されました。
ただし、これまでのすべての解析は無限粒子または連続時間の制限を前提としており、確率的な勾配更新を処理できません。
有限粒子近似、時間離散化、確率的勾配近似による誤差を考慮した、MFLD のカオスの時間内均一伝播を証明するための一般的なフレームワークを提供します。
このフレームワークの幅広い適用可能性を実証するために、(i) 平均場領域のニューラルネットワークや MMD 最小化などの幅広い学習問題、および (ii) さまざまな学習問題の下で、正規化された大域的最適解に対する定量的な収束率の保証を確立します。
SGD および SVRG を含む勾配推定器。
結果の一般性にもかかわらず、標準的なランジュバン力学に特化すると、SGD 設定と SVRG 設定の両方で収束率が向上しました。

要約(オリジナル)

The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift, and it naturally arises from the optimization of two-layer neural networks via (noisy) gradient descent. Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures. However, all prior analyses assumed the infinite-particle or continuous-time limit, and cannot handle stochastic gradient updates. We provide an general framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and stochastic gradient approximation. To demonstrate the wide applicability of this framework, we establish quantitative convergence rate guarantees to the regularized global optimal solution under (i) a wide range of learning problems such as neural network in the mean-field regime and MMD minimization, and (ii) different gradient estimators including SGD and SVRG. Despite the generality of our results, we achieve an improved convergence rate in both the SGD and SVRG settings when specialized to the standard Langevin dynamics.

arxiv情報

著者	Taiji Suzuki,Denny Wu,Atsushi Nitanda
発行日	2023-06-12 16:28:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー