Combinatorial Logistic Bandits

要約

組み合わせロジスティックバンディット (CLogB) と呼ばれる新しいフレームワークを導入します。このフレームワークでは、各ラウンドでベースアームのサブセット (スーパーアームと呼ばれます) が選択され、各ベースアームの結果はバイナリであり、その期待値はロジスティックパラメトリックモデルに従います。
フィードバックは、一般的なアームトリガープロセスによって制御されます。
私たちの研究では、2 つの滑らかさ条件を満たす報酬関数を備えた CLogB を対象としており、オンラインコンテンツ配信、オンラインでのランク付け学習、動的なチャネル割り当てなどのアプリケーションシナリオをキャプチャしています。
まず、分散に依存しない探索ボーナスを利用した、シンプルだが効率的なアルゴリズム CLogUCB を提案します。
1 ノルムトリガー確率変調 (TPM) 平滑性条件下では、CLogUCB は $\tilde{O}(d\sqrt{\kappa KT})$ のリグレス限界を達成します。ここで、$\tilde{O}$ は対数因数を無視します。
$d$ は特徴ベクトルの次元、$\kappa$ はロジスティックモデルの非線形性を表し、$K$ はスーパーアームがトリガーできるベースアームの最大数です。
この結果は、以前の研究よりも $\tilde{O}(\sqrt{\kappa})$ 倍改善されました。
次に、CLogUCB を分散適応バージョン VA-CLogUCB で強化します。これは、同じ 1 ノルム TPM 条件下で $\tilde{O}(d\sqrt{KT})$ のリグレス限界を達成し、別の $\tilde を改善します。
{O}(\sqrt{\kappa})$ 係数。
VA-CLogUCB は、より強力なトリガー確率分散変調 (TPVM) 条件下でさらに大きな期待を示し、$\tilde{O}(d\sqrt{T})$ リグロングを達成し、アクションサイズへの追加の依存性を排除します。
$K$。
さらに、コンテキスト特徴マップが時不変である場合に、タイトな $\tilde{O}(d\sqrt{T})$ リグロングを維持しながら非凸最適化プロセスを排除することで、VA-CLogUCB の計算効率を向上させます。
最後に、合成データセットと実世界のデータセットの実験により、ベンチマークアルゴリズムと比較して、当社のアルゴリズムのパフォーマンスが優れていることが実証されました。

要約(オリジナル)

We introduce a novel framework called combinatorial logistic bandits (CLogB), where in each round, a subset of base arms (called the super arm) is selected, with the outcome of each base arm being binary and its expectation following a logistic parametric model. The feedback is governed by a general arm triggering process. Our study covers CLogB with reward functions satisfying two smoothness conditions, capturing application scenarios such as online content delivery, online learning to rank, and dynamic channel allocation. We first propose a simple yet efficient algorithm, CLogUCB, utilizing a variance-agnostic exploration bonus. Under the 1-norm triggering probability modulated (TPM) smoothness condition, CLogUCB achieves a regret bound of $\tilde{O}(d\sqrt{\kappa KT})$, where $\tilde{O}$ ignores logarithmic factors, $d$ is the dimension of the feature vector, $\kappa$ represents the nonlinearity of the logistic model, and $K$ is the maximum number of base arms a super arm can trigger. This result improves on prior work by a factor of $\tilde{O}(\sqrt{\kappa})$. We then enhance CLogUCB with a variance-adaptive version, VA-CLogUCB, which attains a regret bound of $\tilde{O}(d\sqrt{KT})$ under the same 1-norm TPM condition, improving another $\tilde{O}(\sqrt{\kappa})$ factor. VA-CLogUCB shows even greater promise under the stronger triggering probability and variance modulated (TPVM) condition, achieving a leading $\tilde{O}(d\sqrt{T})$ regret, thus removing the additional dependency on the action-size $K$. Furthermore, we enhance the computational efficiency of VA-CLogUCB by eliminating the nonconvex optimization process when the context feature map is time-invariant while maintaining the tight $\tilde{O}(d\sqrt{T})$ regret. Finally, experiments on synthetic and real-world datasets demonstrate the superior performance of our algorithms compared to benchmark algorithms.

arxiv情報

著者	Xutong Liu,Xiangxiang Dai,Xuchuang Wang,Mohammad Hajiesmaili,John C. S. Lui
発行日	2024-10-22 14:52:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Combinatorial Logistic Bandits

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー