Learning to Explore with Lagrangians for Bandits under Unknown Linear Constraints


私たちはこれらの問題を未知の線形制約を持つ多腕盗賊における純粋な探索として研究し、その目的は $r$$\textit{-良い実現可能な政策}$ を特定することです。
次に、ラグランジュの下限と凸最適化の特性を活用して、Track-and-Stop と Gamified Explorer の 2 つの計算効率の高い拡張機能、つまり LATS と LAGEX を提案します。
最後に、ベースラインに対する LAGEX と LATS の効率的なパフォーマンスを検証する、さまざまな報酬分布と制約を使用した数値実験を実施します。


Pure exploration in bandits models multiple real-world problems, such as tuning hyper-parameters or conducting user studies, where different safety, resource, and fairness constraints on the decision space naturally appear. We study these problems as pure exploration in multi-armed bandits with unknown linear constraints, where the aim is to identify an $r$$\textit{-good feasible policy}$. First, we propose a Lagrangian relaxation of the sample complexity lower bound for pure exploration under constraints. We show how this lower bound evolves with the sequential estimation of constraints. Second, we leverage the Lagrangian lower bound and the properties of convex optimisation to propose two computationally efficient extensions of Track-and-Stop and Gamified Explorer, namely LATS and LAGEX. To this end, we propose a constraint-adaptive stopping rule, and while tracking the lower bound, use pessimistic estimate of the feasible set at each step. We show that these algorithms achieve asymptotically optimal sample complexity upper bounds up to constraint-dependent constants. Finally, we conduct numerical experiments with different reward distributions and constraints that validate efficient performance of LAGEX and LATS with respect to baselines.


著者 Udvas Das,Debabrota Basu
発行日 2024-10-24 15:26:14+00:00
arxivサイト arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

カテゴリー: cs.AI, cs.LG, stat.ME, stat.ML パーマリンク