Tight Memory-Regret Lower Bounds for Streaming Bandits

要約

この論文では、学習者がオンラインで到着するアームとサブリニアなアームの記憶に対処することで後悔を最小限に抑えることを目的とするストリーミングバンディット問題を調査します。
$\Omega \left( (TB)^{\alpha} K^{1-\alpha}\right), \alpha = 2^{B} / (2^{
時間軸が $T$、アームの数が $K$、パスの数が $B$ のアルゴリズムの場合、B+1}-1)$。
その結果、古典的な集中設定における確率的バンディット問題と、制限されたアームメモリを使用したストリーミング設定との間の分離が明らかになりました。
特に、よく知られている $\Omega(\sqrt{KT})$ 下限と比較すると、サブリニアメモリが許可されたストリーミングバンディットアルゴリズムでは追加の二重対数係数が避けられません。
さらに、$\Omega \left(T^{1/(B+1)} \sum_{\Delta_x>0} \frac{\mu^*}{\Delta_x}\ の最初のインスタンス依存の下限を確立します。
右）ストリーミングバンディットの$。
これらの下限は、おそらく独立して興味深い、一連の $\epsilon$-optimal arm 識別タスクのサンプル複雑さ分析へのリグレス最小化設定からの独自の削減を通じて導出されます。
下限を補完するために、リグレス上限 $\tilde{O} \left( (TB)^{\alpha} K^{1 – \alpha}\right)$ を達成するマルチパスアルゴリズムも提供します。
コンスタントアームメモリを使用します。

要約(オリジナル)

In this paper, we investigate the streaming bandits problem, wherein the learner aims to minimize regret by dealing with online arriving arms and sublinear arm memory. We establish the tight worst-case regret lower bound of $\Omega \left( (TB)^{\alpha} K^{1-\alpha}\right), \alpha = 2^{B} / (2^{B+1}-1)$ for any algorithm with a time horizon $T$, number of arms $K$, and number of passes $B$. The result reveals a separation between the stochastic bandits problem in the classical centralized setting and the streaming setting with bounded arm memory. Notably, in comparison to the well-known $\Omega(\sqrt{KT})$ lower bound, an additional double logarithmic factor is unavoidable for any streaming bandits algorithm with sublinear memory permitted. Furthermore, we establish the first instance-dependent lower bound of $\Omega \left(T^{1/(B+1)} \sum_{\Delta_x>0} \frac{\mu^*}{\Delta_x}\right)$ for streaming bandits. These lower bounds are derived through a unique reduction from the regret-minimization setting to the sample complexity analysis for a sequence of $\epsilon$-optimal arms identification tasks, which maybe of independent interest. To complement the lower bound, we also provide a multi-pass algorithm that achieves a regret upper bound of $\tilde{O} \left( (TB)^{\alpha} K^{1 – \alpha}\right)$ using constant arm memory.

arxiv情報

著者	Shaoang Li,Lan Zhang,Junhao Wang,Xiang-Yang Li
発行日	2023-06-13 16:54:13+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Tight Memory-Regret Lower Bounds for Streaming Bandits

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー