Finite-Horizon Single-Pull Restless Bandits: An Efficient Index Policy For Scarce Resource Allocation

要約

Restless multi-armed Bandits (RMAB) は、多くのドメインにわたるシーケンシャルなリソース割り当ての最適化に大きな成功を収めています。
しかし、医療介入プログラムなど、各エージェントが最大 1 つのリソースしか受け取ることができない、リソースが非常に不足している多くの実際の環境では、標準の RMAB フレームワークでは不十分です。
このようなシナリオに対処するために、各アームを 1 回しか引っ張ることができない新しいバリアントである Finite-Horizon Single-Pull RMAB (SPRMAB) を導入します。
このシングルプル制約によりさらに複雑さが増し、多くの既存の RMAB ソリューションが最適ではない、または非効果的になってしまいます。
これに対処するために、ダミーステートを使用してシステムを複製し、アームがアクティブ化されるとダミーステート内のみで遷移するようにすることを提案します。
この欠点に対処するために、システムを拡張してワンプル制約を強制する \textit{ダミー状態} を使用することを提案します。
次に、この拡張されたシステム用の軽量インデックスポリシーを設計します。
私たちのインデックスポリシーが $\tilde{\mathcal{O}}\left(\frac{1}{\rho^{1/2}}\ という線形的に減衰しない平均最適性ギャップを達成することを初めて実証しました。
右)$ は有限数のアームに対するもので、$\rho$ は各アームクラスターのスケーリング係数です。
広範なシミュレーションにより提案された手法が検証され、既存のベンチマークと比較してさまざまなドメインにわたって堅牢なパフォーマンスが示されています。

要約(オリジナル)

Restless multi-armed bandits (RMABs) have been highly successful in optimizing sequential resource allocation across many domains. However, in many practical settings with highly scarce resources, where each agent can only receive at most one resource, such as healthcare intervention programs, the standard RMAB framework falls short. To tackle such scenarios, we introduce Finite-Horizon Single-Pull RMABs (SPRMABs), a novel variant in which each arm can only be pulled once. This single-pull constraint introduces additional complexity, rendering many existing RMAB solutions suboptimal or ineffective. %To address this, we propose using dummy states to duplicate the system, ensuring that once an arm is activated, it transitions exclusively within the dummy states. To address this shortcoming, we propose using \textit{dummy states} that expand the system and enforce the one-pull constraint. We then design a lightweight index policy for this expanded system. For the first time, we demonstrate that our index policy achieves a sub-linearly decaying average optimality gap of $\tilde{\mathcal{O}}\left(\frac{1}{\rho^{1/2}}\right)$ for a finite number of arms, where $\rho$ is the scaling factor for each arm cluster. Extensive simulations validate the proposed method, showing robust performance across various domains compared to existing benchmarks.

arxiv情報

著者	Guojun Xiong,Haichuan Wang,Yuqi Pan,Saptarshi Mandal,Sanket Shah,Niclas Boehmer,Milind Tambe
発行日	2025-01-10 16:54:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Finite-Horizon Single-Pull Restless Bandits: An Efficient Index Policy For Scarce Resource Allocation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー