Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games

要約

リグレスマッチング、具体的にはリグレスマッチング $^+$ (RM$^+$) に基づくアルゴリズムとそのバリアントは、実際に大規模な 2 プレイヤーのゼロサムゲームを解決するための最も一般的なアプローチです。
ゼロサムゲームの強力な最終反復特性とエルゴード収束特性を持つ楽観的勾配降下法などのアルゴリズムとは異なり、リグレスマッチングアルゴリズムの最終反復特性については事実上何もわかっていません。
数値最適化の理由と、ゲームにおけるリアルワード学習のモデル化との関連性における最終反復収束の重要性を考慮して、この論文では、RM$^+$ のさまざまな一般的なバリアントの最終反復収束特性を研究します。
まず、同時 RM$^+$、交互 RM$^+$、同時予測 RM$^+$ などのいくつかの実用的なバリアントはすべて、単純な $3\times 3$ であっても最終反復の収束保証が欠けていることを数値的に示します。
ゲーム。
次に、平滑化手法に基づくこれらのアルゴリズムの最近の変形が最終反復収束を享受することを証明します。超勾配 RM$^{+}$ とスムーズ予測 RM$^+$ が漸近的最終反復収束を享受することを証明します (レートなし)。
) と $1/\sqrt{t}$ の最良反復収束。
最後に、これらのアルゴリズムの再起動されたバリアントを紹介し、それらが線形レートの最終反復収束を享受できることを示します。

要約(オリジナル)

Algorithms based on regret matching, specifically regret matching$^+$ (RM$^+$), and its variants are the most popular approaches for solving large-scale two-player zero-sum games in practice. Unlike algorithms such as optimistic gradient descent ascent, which have strong last-iterate and ergodic convergence properties for zero-sum games, virtually nothing is known about the last-iterate properties of regret-matching algorithms. Given the importance of last-iterate convergence for numerical optimization reasons and relevance as modeling real-word learning in games, in this paper, we study the last-iterate convergence properties of various popular variants of RM$^+$. First, we show numerically that several practical variants such as simultaneous RM$^+$, alternating RM$^+$, and simultaneous predictive RM$^+$, all lack last-iterate convergence guarantees even on a simple $3\times 3$ game. We then prove that recent variants of these algorithms based on a smoothing technique do enjoy last-iterate convergence: we prove that extragradient RM$^{+}$ and smooth Predictive RM$^+$ enjoy asymptotic last-iterate convergence (without a rate) and $1/\sqrt{t}$ best-iterate convergence. Finally, we introduce restarted variants of these algorithms, and show that they enjoy linear-rate last-iterate convergence.

arxiv情報

著者	Yang Cai,Gabriele Farina,Julien Grand-Clément,Christian Kroer,Chung-Wei Lee,Haipeng Luo,Weiqiang Zheng
発行日	2023-11-01 17:34:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー