Ensemble sampling for linear bandits: small ensembles suffice

要約

確率的線形バンディット設定に対するアンサンブルサンプリングの最初の有用かつ厳密な分析を提供します。
特に、標準的な仮定の下では、相互作用範囲 $T$ を持つ $d$ 次元の確率的線形バンディットの場合、次数 $d \log T$ のサイズのアンサンブルによるアンサンブルサンプリングは、
$(d \log T)^{5/2} \sqrt{T}$ を注文します。
私たちの結果は、$\smash{\sqrt{T}}$ に近いオーダーを取得しながら、アンサンブルのサイズを $T$ に線形にスケールする必要がない構造化設定の最初の結果です。これはアンサンブルサンプリングの目的を無効にします。
後悔。
私たちの結果は、無限のアクションセットを可能にする最初の結果でもあります。

要約(オリジナル)

We provide the first useful and rigorous analysis of ensemble sampling for the stochastic linear bandit setting. In particular, we show that, under standard assumptions, for a $d$-dimensional stochastic linear bandit with an interaction horizon $T$, ensemble sampling with an ensemble of size of order $d \log T$ incurs regret at most of the order $(d \log T)^{5/2} \sqrt{T}$. Ours is the first result in any structured setting not to require the size of the ensemble to scale linearly with $T$ — which defeats the purpose of ensemble sampling — while obtaining near $\smash{\sqrt{T}}$ order regret. Our result is also the first to allow for infinite action sets.

arxiv情報

著者	David Janz,Alexander E. Litvak,Csaba Szepesvári
発行日	2025-01-15 15:41:09+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Ensemble sampling for linear bandits: small ensembles suffice

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー