\ textit {既知の}分布に準拠した独立した添加剤の妨害に依存する既存のFTPLアルゴリズムとは異なり、特定のセットに属することが知られている\ textit {Amviguous}分布によって支配された妨害を可能にし、楽観主義の原則を提案します。
このアルゴリズムは、すべての反復で最適化の問題を解決する標準のFTRLアルゴリズムよりも最大$ 10^4 $倍高速です。
Follow-The-Regularized-Leader (FTRL) algorithms often enjoy optimal regret for adversarial as well as stochastic bandit problems and allow for a streamlined analysis. Nonetheless, FTRL algorithms require the solution of an optimization problem in every iteration and are thus computationally challenging. In contrast, Follow-The-Perturbed-Leader (FTPL) algorithms achieve computational efficiency by perturbing the estimates of the rewards of the arms, but their regret analysis is cumbersome. We propose a new FTPL algorithm that generates optimal policies for both adversarial and stochastic multi-armed bandits. Like FTRL, our algorithm admits a unified regret analysis, and similar to FTPL, it offers low computational costs. Unlike existing FTPL algorithms that rely on independent additive disturbances governed by a \textit{known} distribution, we allow for disturbances governed by an \textit{ambiguous} distribution that is only known to belong to a given set and propose a principle of optimism in the face of ambiguity. Consequently, our framework generalizes existing FTPL algorithms. It also encapsulates a broad range of FTRL methods as special cases, including several optimal ones, which appears to be impossible with current FTPL methods. Finally, we use techniques from discrete choice theory to devise an efficient bisection algorithm for computing the optimistic arm sampling probabilities. This algorithm is up to $10^4$ times faster than standard FTRL algorithms that solve an optimization problem in every iteration. Our results not only settle existing conjectures but also provide new insights into the impact of perturbations by mapping FTRL to FTPL.
著者 | Mengmeng Li,Daniel Kuhn,Bahar Taşkesen |
発行日 | 2025-02-13 16:35:17+00:00 |
arxivサイト | arxiv_id(pdf) |
提供元, 利用サービス
arxiv.jp, Google