Ensemble sampling for linear bandits: small ensembles suffice

要約

確率的線形バンディット設定に対するアンサンブルサンプリングの最初の有用で厳密な分析を提供します。
特に、標準的な仮定の下で、相互作用範囲 $T$ を持つ $d$ 次元の確率的線形バンディットの場合、 $d \log T$ のオーダーのサイズ $m$ のアンサンブルによるアンサンブルサンプリングが発生することを示します。
後悔は順序 $(d \log T)^{5/2} \sqrt{T}$ によって制限されます。
私たちの結果は、アンサンブルのサイズを $T$ に線形にスケールする必要がない構造化設定における最初の結果であり、これはアンサンブルサンプリングの目的を無効にしますが、$\sqrt{T}$ に近いオーダーリグレットが得られます。
私たちの結果は、無限のアクションセットを可能にする最初の結果でもあります。

要約(オリジナル)

We provide the first useful, rigorous analysis of ensemble sampling for the stochastic linear bandit setting. In particular, we show that, under standard assumptions, for a $d$-dimensional stochastic linear bandit with an interaction horizon $T$, ensemble sampling with an ensemble of size $m$ on the order of $d \log T$ incurs regret bounded by order $(d \log T)^{5/2} \sqrt{T}$. Ours is the first result in any structured setting not to require the size of the ensemble to scale linearly with $T$ — which defeats the purpose of ensemble sampling — while obtaining near $\sqrt{T}$ order regret. Ours is also the first result that allows infinite action sets.

arxiv情報

著者	David Janz,Alexander E. Litvak,Csaba Szepesvári
発行日	2023-11-14 18:41:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Ensemble sampling for linear bandits: small ensembles suffice

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー