On the Minimax Regret for Online Learning with Feedback Graphs

要約

この研究では、非常に観察可能な無向フィードバックグラフを使用して、オンライン学習の後悔の上限と下限を改善しました。
この問題の最もよく知られている上限は $\mathcal{O}\bigl(\sqrt{\alpha T\ln K}\bigr)$ です。ここで、$K$ はアクションの数、$\alpha$ は独立性です。
グラフの番号、$T$ は時間軸です。
$\sqrt{\ln K}$ 係数は、$\alpha = 1$ の場合 (専門家の場合) に必要であることが知られています。
一方、$\alpha = K$ (山賊の場合) の場合、最小レートは $\Theta\bigl(\sqrt{KT}\bigr)$ であることが知られており、下限 $\Omega\bigl
(\sqrt{\alpha T}\bigr)$ は、任意の $\alpha$ に対して成り立つことが知られています。
改良された上限 $\mathcal{O}\bigl(\sqrt{\alpha T(1+\ln(K/\alpha))}\bigr)$ はあらゆる $\alpha$ に当てはまり、盗賊の下限と一致します。
中間のケースを補間しながら、専門家との対話を可能にします。
この結果を証明するために、$\alpha$ とともに変化する慎重に選択された $q \in [1/2, 1)$ の値に対して $q$-Tsallis エントロピーを備えた FTRL を使用します。
このアルゴリズムの分析には、リグレスの分散項の新しい限界が必要です。
また、独立性数に関する事前の知識を必要とせずに、時間変化グラフに手法を拡張する方法も示します。
上限は、すべての $\alpha > 1$ の改良された $\Omega\bigl(\sqrt{\alpha T(\ln K)/(\ln\alpha)}\bigr)$ 下限によって補完されます。
マルチタスク学習の新たな削減に依存しています。
これは、$\alpha < K$ になるとすぐに対数係数が必要になることを示しています。

要約(オリジナル)

In this work, we improve on the upper and lower bounds for the regret of online learning with strongly observable undirected feedback graphs. The best known upper bound for this problem is $\mathcal{O}\bigl(\sqrt{\alpha T\ln K}\bigr)$, where $K$ is the number of actions, $\alpha$ is the independence number of the graph, and $T$ is the time horizon. The $\sqrt{\ln K}$ factor is known to be necessary when $\alpha = 1$ (the experts case). On the other hand, when $\alpha = K$ (the bandits case), the minimax rate is known to be $\Theta\bigl(\sqrt{KT}\bigr)$, and a lower bound $\Omega\bigl(\sqrt{\alpha T}\bigr)$ is known to hold for any $\alpha$. Our improved upper bound $\mathcal{O}\bigl(\sqrt{\alpha T(1+\ln(K/\alpha))}\bigr)$ holds for any $\alpha$ and matches the lower bounds for bandits and experts, while interpolating intermediate cases. To prove this result, we use FTRL with $q$-Tsallis entropy for a carefully chosen value of $q \in [1/2, 1)$ that varies with $\alpha$. The analysis of this algorithm requires a new bound on the variance term in the regret. We also show how to extend our techniques to time-varying graphs, without requiring prior knowledge of their independence numbers. Our upper bound is complemented by an improved $\Omega\bigl(\sqrt{\alpha T(\ln K)/(\ln\alpha)}\bigr)$ lower bound for all $\alpha > 1$, whose analysis relies on a novel reduction to multitask learning. This shows that a logarithmic factor is necessary as soon as $\alpha < K$.

arxiv情報

著者	Khaled Eldowa,Emmanuel Esposito,Tommaso Cesari,Nicolò Cesa-Bianchi
発行日	2023-05-24 17:40:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

On the Minimax Regret for Online Learning with Feedback Graphs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー