Planning Multiple Epidemic Interventions with Reinforcement Learning

要約

流行と闘うには、マスク着用の義務化、予防接種、学校や職場の閉鎖など、さまざまな介入をいつどのように適用するかを記述した計画を見つけることが必要です。
最適な計画は、人命の損失、病気の負担、経済的コストを最小限に抑えて流行を抑制します。
最適な計画を見つけることは、現実的な設定では手に負えない計算問題です。
しかし、政策立案者は、特に連続的で同様に複雑な状態空間が与えられた場合に、連続的で複雑な行動空間に対して複数の可能な介入を検討する場合に、疾病と経済的コストを最小限に抑える計画を効率的に検索できるツールから大きな恩恵を受けるだろう。
この問題をマルコフ決定プロセスとして定式化します。
私たちの定式化は、常微分方程式で定義されるあらゆる疾患モデルに対して複数の連続介入を表現できるという点で独特です。
最先端のアクタークリティカル強化学習アルゴリズム (PPO および SAC) を効果的に適用して、全体のコストを最小限に抑える計画を探索する方法を説明します。
私たちはこれらのアルゴリズムの学習パフォーマンスを経験的に評価し、そのパフォーマンスを政策立案者が構築した計画を模倣した手作りのベースラインと比較します。
私たちの方法はベースラインを上回ります。
私たちの研究は、政策立案者をサポートするためのコンピューターによるアプローチの実現可能性を裏付けています。

要約(オリジナル)

Combating an epidemic entails finding a plan that describes when and how to apply different interventions, such as mask-wearing mandates, vaccinations, school or workplace closures. An optimal plan will curb an epidemic with minimal loss of life, disease burden, and economic cost. Finding an optimal plan is an intractable computational problem in realistic settings. Policy-makers, however, would greatly benefit from tools that can efficiently search for plans that minimize disease and economic costs especially when considering multiple possible interventions over a continuous and complex action space given a continuous and equally complex state space. We formulate this problem as a Markov decision process. Our formulation is unique in its ability to represent multiple continuous interventions over any disease model defined by ordinary differential equations. We illustrate how to effectively apply state-of-the-art actor-critic reinforcement learning algorithms (PPO and SAC) to search for plans that minimize overall costs. We empirically evaluate the learning performance of these algorithms and compare their performance to hand-crafted baselines that mimic plans constructed by policy-makers. Our method outperforms baselines. Our work confirms the viability of a computational approach to support policy-makers

arxiv情報

著者	Anh Mai,Nikunj Gupta,Azza Abouzied,Dennis Shasha
発行日	2023-05-16 17:09:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Planning Multiple Epidemic Interventions with Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー