Meta-Learning Adversarial Bandit Algorithms

要約

私たちは、自然な類似性の尺度に従って複数のタスクが類似している場合に、そのタスク全体のパフォーマンスを向上させることを目的として、バンディットフィードバックを使用したオンラインメタ学習を研究しています。
敵対的なオンライン内オンライン部分情報設定をターゲットにした最初の企業として、外部学習器を組み合わせて、2 つの重要なケースに対して内部学習器の初期化とその他のハイパーパラメーターを同時に調整するメタアルゴリズムを設計しました。マルチアームバンディット (MAB)
バンディット線形最適化 (BLO) です。
MAB の場合、メタ学習者は Exp3 の Tsallis エントロピー一般化のハイパーパラメーターを初期化して設定します。これにより、後から考えた最適化のエントロピーが小さい場合、タスクの平均後悔が改善されます。
BLO については、自己一致バリア正則化子を使用してオンラインミラーディセント (OMD) を初期化して調整する方法を学習し、タスクの平均化された後悔が、それらが引き起こすアクション空間に依存する尺度によって直接変化することを示しました。
私たちの保証は、非正則化フォローザリーダーと 2 つのレベルの低次元ハイパーパラメーター調整を組み合わせることで、OMD の後悔を境界付ける非リプシッツ、および場合によっては非凸のブレグマン発散の一連のアフィン関数を学習するのに十分であることを証明することに依存しています。

要約(オリジナル)

We study online meta-learning with bandit feedback, with the goal of improving performance across multiple tasks if they are similar according to some natural similarity measure. As the first to target the adversarial online-within-online partial-information setting, we design meta-algorithms that combine outer learners to simultaneously tune the initialization and other hyperparameters of an inner learner for two important cases: multi-armed bandits (MAB) and bandit linear optimization (BLO). For MAB, the meta-learners initialize and set hyperparameters of the Tsallis-entropy generalization of Exp3, with the task-averaged regret improving if the entropy of the optima-in-hindsight is small. For BLO, we learn to initialize and tune online mirror descent (OMD) with self-concordant barrier regularizers, showing that task-averaged regret varies directly with an action space-dependent measure they induce. Our guarantees rely on proving that unregularized follow-the-leader combined with two levels of low-dimensional hyperparameter tuning is enough to learn a sequence of affine functions of non-Lipschitz and sometimes non-convex Bregman divergences bounding the regret of OMD.

arxiv情報

著者	Mikhail Khodak,Ilya Osadchiy,Keegan Harris,Maria-Florina Balcan,Kfir Y. Levy,Ron Meir,Zhiwei Steven Wu
発行日	2023-11-01 16:15:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Meta-Learning Adversarial Bandit Algorithms

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー