Negative Stepsizes Make Gradient-Descent-Ascent Converge

要約

最小-最大問題の効率的な計算は、最適化、学習、ゲーム、制御における中心的な問題である。最も自然なアルゴリズムは勾配降下法（gradient-descent-ascent：GDA）であろう。しかし、1970年代以降、GDAは単純な問題でさえも収束させることができないというのが従来の常識であった。この失敗が、勾配外、楽観、運動量、アンカリングなどの追加的な構成要素を用いたGDAの修正に関する広範な文献に拍車をかけた。これに対して我々は、ステップサイズを適切に選択するだけで、GDAがそのままの形で収束することを示す。重要な革新点は、時間変化し、非対称で、周期的に負となる、従来とは異なるステップサイズスケジュール（スリングショットステップサイズスケジュールと呼ばれる）の提案である。我々は、この3つの性質が収束に必要であること、そして、この3つの性質を総合することで、GDAが古典的な反例（例えば、制約のない凸凹問題）でも収束できることを示す。我々の結果は全て、GDAの最後のイテレートに適用される。アルゴリズムの核となる直感は、負のステップサイズは後方への進展をもたらすが、min変数とmax変数の同期を解除し（GDAの循環問題を克服する）、他の反復における前方への進展が圧倒的に大きくなるスリングショット現象をもたらすということである。その結果、全体の収束が速くなる。幾何学的には、スリングショットダイナミクスは勾配流の非可逆性を利用している。正負のステップが一次的に相殺され、収束につながる新しい方向への二次的な正味の移動が生じ、そうでなければGDAが移動することは不可能である。我々はこれを2次の有限微分アルゴリズムと解釈し、興味深いことに、ディープニューラルネットワーク（例えばGANの訓練）を含む最小-最大問題で経験的に人気のあるアルゴリズムであるコンセンサス最適化を近似的に実装していることを示す。

要約(オリジナル)

Efficient computation of min-max problems is a central question in optimization, learning, games, and controls. Arguably the most natural algorithm is gradient-descent-ascent (GDA). However, since the 1970s, conventional wisdom has argued that GDA fails to converge even on simple problems. This failure spurred an extensive literature on modifying GDA with additional building blocks such as extragradients, optimism, momentum, anchoring, etc. In contrast, we show that GDA converges in its original form by simply using a judicious choice of stepsizes. The key innovation is the proposal of unconventional stepsize schedules (dubbed slingshot stepsize schedules) that are time-varying, asymmetric, and periodically negative. We show that all three properties are necessary for convergence, and that altogether this enables GDA to converge on the classical counterexamples (e.g., unconstrained convex-concave problems). All of our results apply to the last iterate of GDA, as is typically desired in practice. The core algorithmic intuition is that although negative stepsizes make backward progress, they de-synchronize the min and max variables (overcoming the cycling issue of GDA), and lead to a slingshot phenomenon in which the forward progress in the other iterations is overwhelmingly larger. This results in fast overall convergence. Geometrically, the slingshot dynamics leverage the non-reversibility of gradient flow: positive/negative steps cancel to first order, yielding a second-order net movement in a new direction that leads to convergence and is otherwise impossible for GDA to move in. We interpret this as a second-order finite-differencing algorithm and show that, intriguingly, it approximately implements consensus optimization, an empirically popular algorithm for min-max problems involving deep neural networks (e.g., training GANs).

arxiv情報

著者	Henry Shugart,Jason M. Altschuler
発行日	2025-05-02 17:59:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Negative Stepsizes Make Gradient-Descent-Ascent Converge

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー