Stochastic Contextual Bandits with Graph-based Contexts

要約

タイトル：グラフベースの文脈を持つステュークアスティックコンテキストバンディット
要約：この論文は、文脈がグラフの頂点であり、グラフの構造が文脈間の類似性に関する情報を提供するステュークアスティックコンテキストバンディット問題のバージョンにオンライングラフ予測問題を自然に拡張するものです。ステュークアスティックコンテキストバンディット環境では、同じラベルを持つ頂点は同じ報酬分布を共有しています。グラフラベル予測における標準的な場合のインスタンスの難しさの概念は、異なるラベルを持つエッジの数で定義されるカットサイズfです。直線グラフと木において、regret boundが$\tilde {O} (T ^ {2/3} K ^ {1/3} f ^ {1/3})$であるアルゴリズムを提供します。ここで、Kはアームの数です。当社のアルゴリズムは、ZimmertおよびSeldin[AISTAT ’19、JMLR’21]による最適な確率的バンディットアルゴリズムに依存しています。最適なアームが他のアームを上回る場合、regretは$\tilde{O}(\sqrt{KT\cdot f})$に改善されます。後者の場合のregretバウンドは、より一般的な場合における他の最適なコンテキストバンディットの結果と比較可能ですが、当社のアルゴリズムは解析が容易であり、非常に効率的に実行され、入力コンテキストシーケンスにi.i.d.の仮定を必要としません。また、一般のグラフでもランダムスパニングツリーによる標準的な減少を用いてアルゴリズムを処理できます。

要約(オリジナル)

We naturally generalize the on-line graph prediction problem to a version of stochastic contextual bandit problems where contexts are vertices in a graph and the structure of the graph provides information on the similarity of contexts. More specifically, we are given a graph $G=(V,E)$, whose vertex set $V$ represents contexts with {\em unknown} vertex label $y$. In our stochastic contextual bandit setting, vertices with the same label share the same reward distribution. The standard notion of instance difficulties in graph label prediction is the cutsize $f$ defined to be the number of edges whose end points having different labels. For line graphs and trees we present an algorithm with regret bound of $\tilde{O}(T^{2/3}K^{1/3}f^{1/3})$ where $K$ is the number of arms. Our algorithm relies on the optimal stochastic bandit algorithm by Zimmert and Seldin~[AISTAT’19, JMLR’21]. When the best arm outperforms the other arms, the regret improves to $\tilde{O}(\sqrt{KT\cdot f})$. The regret bound in the later case is comparable to other optimal contextual bandit results in more general cases, but our algorithm is easy to analyze, runs very efficiently, and does not require an i.i.d. assumption on the input context sequence. The algorithm also works with general graphs using a standard random spanning tree reduction.

arxiv情報

著者	Jittat Fakcharoenphol,Chayutpong Prompak
発行日	2023-05-02 14:51:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Stochastic Contextual Bandits with Graph-based Contexts

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー