A Bayesian Approach to Online Learning for Contextual Restless Bandits with Applications to Public Health

要約

Restless multi-armed Bandits (RMAB) は、公衆衛生介入プログラムにおける逐次的なリソース割り当てをモデル化するために使用されます。
これらの設定では、基礎となる遷移ダイナミクスが先験的に不明であることが多く、オンライン強化学習 (RL) が必要です。
ただし、RMAB のオンライン RL の既存の方法では、コンテキスト情報や非定常性など、現実世界の公衆衛生アプリケーションによく存在する特性を組み込むことができません。
我々は、コンテキスト RMAB のためのベイジアン学習 (BCoR) を紹介します。これは、ベイジアンモデリングの技術とトンプソンサンプリングを新たに組み合わせて、コンテキストおよび非定常 RMAB などの広範囲の複雑な RMAB 設定を柔軟にモデル化する、RMAB 用のオンライン RL アプローチです。
私たちのアプローチの主な貢献は、アーム内およびアーム間で共有された情報を活用して、予算が限られた設定で比較的短期間で未知の RMAB 移行ダイナミクスを迅速に学習できることです。
我々は経験的に、BCoR が、インドにおける実際の公衆衛生キャンペーンから構築されたものを含む、さまざまな実験設定にわたって既存のアプローチよりも大幅に高い有限サンプルのパフォーマンスを達成することを示しています。

要約(オリジナル)

Restless multi-armed bandits (RMABs) are used to model sequential resource allocation in public health intervention programs. In these settings, the underlying transition dynamics are often unknown a priori, requiring online reinforcement learning (RL). However, existing methods in online RL for RMABs cannot incorporate properties often present in real-world public health applications, such as contextual information and non-stationarity. We present Bayesian Learning for Contextual RMABs (BCoR), an online RL approach for RMABs that novelly combines techniques in Bayesian modeling with Thompson sampling to flexibly model a wide range of complex RMAB settings, such as contextual and non-stationary RMABs. A key contribution of our approach is its ability to leverage shared information within and between arms to learn unknown RMAB transition dynamics quickly in budget-constrained settings with relatively short time horizons. Empirically, we show that BCoR achieves substantially higher finite-sample performance than existing approaches over a range of experimental settings, including one constructed from a real-world public health campaign in India.

arxiv情報

著者	Biyonka Liang,Lily Xu,Aparna Taneja,Milind Tambe,Lucas Janson
発行日	2024-02-07 15:11:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Bayesian Approach to Online Learning for Contextual Restless Bandits with Applications to Public Health

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー