Learning to Explore with Lagrangians for Bandits under Unknown Linear Constraints

要約

バンディットにおける純粋な探索は、ハイパーパラメータの調整やユーザー調査の実施など、意思決定空間に対するさまざまな安全性、リソース、公平性の制約が自然に現れる複数の現実世界の問題をモデル化します。
私たちはこれらの問題を未知の線形制約を持つ多腕盗賊における純粋な探索として研究し、その目的は $r$$\textit{-良い実現可能な政策}$ を特定することです。
まず、制約の下での純粋な探索のためのサンプル複雑さの下限のラグランジュ緩和を提案します。
この下限が制約の逐次推定によってどのように変化するかを示します。
次に、ラグランジュの下限と凸最適化の特性を活用して、Track-and-Stop と Gamified Explorer の 2 つの計算効率の高い拡張機能、つまり LATS と LAGEX を提案します。
この目的を達成するために、制約適応型停止ルールを提案し、下限を追跡しながら、各ステップで実現可能セットの悲観的な推定を使用します。
これらのアルゴリズムが、制約に依存する定数まで漸近的に最適なサンプル複雑さの上限を達成することを示します。
最後に、ベースラインに対する LAGEX と LATS の効率的なパフォーマンスを検証する、さまざまな報酬分布と制約を使用した数値実験を実施します。

要約(オリジナル)

Pure exploration in bandits models multiple real-world problems, such as tuning hyper-parameters or conducting user studies, where different safety, resource, and fairness constraints on the decision space naturally appear. We study these problems as pure exploration in multi-armed bandits with unknown linear constraints, where the aim is to identify an $r$$\textit{-good feasible policy}$. First, we propose a Lagrangian relaxation of the sample complexity lower bound for pure exploration under constraints. We show how this lower bound evolves with the sequential estimation of constraints. Second, we leverage the Lagrangian lower bound and the properties of convex optimisation to propose two computationally efficient extensions of Track-and-Stop and Gamified Explorer, namely LATS and LAGEX. To this end, we propose a constraint-adaptive stopping rule, and while tracking the lower bound, use pessimistic estimate of the feasible set at each step. We show that these algorithms achieve asymptotically optimal sample complexity upper bounds up to constraint-dependent constants. Finally, we conduct numerical experiments with different reward distributions and constraints that validate efficient performance of LAGEX and LATS with respect to baselines.

arxiv情報

著者	Udvas Das,Debabrota Basu
発行日	2024-10-24 15:26:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning to Explore with Lagrangians for Bandits under Unknown Linear Constraints

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー