The Dimension Strikes Back with Gradients: Generalization of Gradient Methods in Stochastic Convex Optimization

要約

基本的な確率的凸最適化設定における勾配法の一般化パフォーマンスを、その次元依存性に焦点を当てて研究します。
まず、フルバッチ勾配降下法 (GD) の場合、次元 $d=O(n^2)$ で学習問題の構築を行います。ここで、GD の標準バージョン (経験的リスクの最適なパフォーマンスのために調整された) は、
$n$ の訓練例は、一定の確率で、$\Omega(1)$ 集団の過剰リスクを伴う近似的な経験的リスク最小化関数に収束します。
私たちの限界は、標準的な GD が非自明なテストエラーに達するために必要なトレーニング例の数の下限 $\Omega (\sqrt{d})$ に変換され、Feldman (2016) と Amir によって提起された未解決の質問に答えます。
、Koren、および Livni (2021b) は、自明ではない次元依存性が避けられないことを示しています。
さらに、標準的なワンパス確率的勾配降下法 (SGD) の場合、同じ構築手法を適用すると、SGD のサンプル複雑度が非許容レベルに達するための同様の $\Omega(\sqrt{d})$ 下限が提供されることを示します。
– 最適なテストパフォーマンスを達成したにもかかわらず、些細な経験的誤差。
これにより、以前の研究 (Koren、Livni、Mansour、および Sherman、2022) と比較して次元依存性が指数関数的に改善され、そこに残された未解決の疑問が解決されました。

要約(オリジナル)

We study the generalization performance of gradient methods in the fundamental stochastic convex optimization setting, focusing on its dimension dependence. First, for full-batch gradient descent (GD) we give a construction of a learning problem in dimension $d=O(n^2)$, where the canonical version of GD (tuned for optimal performance of the empirical risk) trained with $n$ training examples converges, with constant probability, to an approximate empirical risk minimizer with $\Omega(1)$ population excess risk. Our bound translates to a lower bound of $\Omega (\sqrt{d})$ on the number of training examples required for standard GD to reach a non-trivial test error, answering an open question raised by Feldman (2016) and Amir, Koren, and Livni (2021b) and showing that a non-trivial dimension dependence is unavoidable. Furthermore, for standard one-pass stochastic gradient descent (SGD), we show that an application of the same construction technique provides a similar $\Omega(\sqrt{d})$ lower bound for the sample complexity of SGD to reach a non-trivial empirical error, despite achieving optimal test performance. This again provides an exponential improvement in the dimension dependence compared to previous work (Koren, Livni, Mansour, and Sherman, 2022), resolving an open question left therein.

arxiv情報

著者	Matan Schliserman,Uri Sherman,Tomer Koren
発行日	2024-01-22 15:50:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

The Dimension Strikes Back with Gradients: Generalization of Gradient Methods in Stochastic Convex Optimization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー