Hierarchical Upper Confidence Bounds for Constrained Online Learning

要約

マルチアームドバンディット (MAB) 問題は、不確実性の下での逐次的な意思決定における基礎的なフレームワークであり、臨床試験、オンライン広告、リソース割り当てなどの分野での応用が広く研究されています。
ただし、従来の MAB 定式化では、意思決定が階層的に構造化されたり、マルチレベルの制約が含まれたり、コンテキスト依存のアクションスペースを特徴とするシナリオを適切に捉えることができません。
このホワイトペーパーでは、階層的制約付きバンディット (HCB) フレームワークを紹介します。これは、コンテキストバンディット問題を拡張して、階層的な意思決定構造とマルチレベル制約を組み込みます。
我々は、階層設定内の信頼限界を活用することで HCB 問題の複雑さに対処するように設計された、階層制約付き上限信頼境界 (HC-UCB) アルゴリズムを提案します。
私たちの理論分析は、HC-UCB のサブリニアリグレス限界を確立し、すべての階層レベルで制約を満たすための高確率の保証を提供します。
さらに、HCB 問題のリグレスのミニマックス下限を導出し、アルゴリズムがほぼ最適であることを示しています。
この結果は、意思決定プロセスが本質的に階層的で制約されている現実世界のアプリケーションにとって重要であり、複数レベルの意思決定にわたって探索と活用のバランスをとる堅牢で効率的なソリューションを提供します。

要約(オリジナル)

The multi-armed bandit (MAB) problem is a foundational framework in sequential decision-making under uncertainty, extensively studied for its applications in areas such as clinical trials, online advertising, and resource allocation. Traditional MAB formulations, however, do not adequately capture scenarios where decisions are structured hierarchically, involve multi-level constraints, or feature context-dependent action spaces. In this paper, we introduce the hierarchical constrained bandits (HCB) framework, which extends the contextual bandit problem to incorporate hierarchical decision structures and multi-level constraints. We propose the hierarchical constrained upper confidence bound (HC-UCB) algorithm, designed to address the complexities of the HCB problem by leveraging confidence bounds within a hierarchical setting. Our theoretical analysis establishes sublinear regret bounds for HC-UCB and provides high-probability guarantees for constraint satisfaction at all hierarchical levels. Furthermore, we derive a minimax lower bound on the regret for the HCB problem, demonstrating the near-optimality of our algorithm. The results are significant for real-world applications where decision-making processes are inherently hierarchical and constrained, offering a robust and efficient solution that balances exploration and exploitation across multiple levels of decision-making.

arxiv情報

著者	Ali Baheri
発行日	2024-10-22 17:41:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Hierarchical Upper Confidence Bounds for Constrained Online Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー