Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities

要約

分散選択問題を動機として、マルチプレイヤーマルチアームバンディット (MAB) モデルの新しいバリアントを定式化します。これは、各アームへのリクエストの確率的な到着と、プレイヤーへのリクエストの割り当てポリシーを捕捉します。
課題は、プレーヤーが互いに通信することなく、最適なアーム引きプロファイル (アーム引きプロファイルによって各アームのプレーヤーの数が規定される) に従ってアームを選択するような分散学習アルゴリズムを設計する方法です。
まず、多項式の計算複雑さで最適なアーム牽引プロファイルの 1 つを見つける貪欲なアルゴリズムを設計します。
また、プレイヤーが予想される一定のラウンド数で最適な腕を引くプロファイルにコミットできるように、反復分散アルゴリズムも設計しています。
モデルパラメーターが不明な場合のオンライン設定に対処するために、Explorer then commit (ETC) フレームワークを適用します。
私たちは、プレーヤーが最適な腕を引くプロファイルを推定するための探索戦略を設計します。
このような見積もりはプレイヤーによって異なる可能性があるため、プレイヤーがコミットするのは困難です。
次に、反復分散アルゴリズムを設計します。これにより、プレイヤーはわずか M ラウンドで最適な腕を引くプロファイルに関する合意に達することができます。
アルゴリズムを検証するために実験を実施します。

要約(オリジナル)

Motivated by distributed selection problems, we formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures stochastic arrival of requests to each arm, as well as the policy of allocating requests to players. The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile (an arm pulling profile prescribes the number of players at each arm) without communicating to each other. We first design a greedy algorithm, which locates one of the optimal arm pulling profiles with a polynomial computational complexity. We also design an iterative distributed algorithm for players to commit to an optimal arm pulling profile with a constant number of rounds in expectation. We apply the explore then commit (ETC) framework to address the online setting when model parameters are unknown. We design an exploration strategy for players to estimate the optimal arm pulling profile. Since such estimates can be different across different players, it is challenging for players to commit. We then design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds. We conduct experiments to validate our algorithm.

arxiv情報

著者	Hong Xie,Jinyu Mo,Defu Lian,Jie Wang,Enhong Chen
発行日	2024-08-20 13:57:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー