Bayesian Off-Policy Evaluation and Learning for Large Action Spaces

要約

インタラクティブシステムでは、アクションは相関関係にあることが多く、大規模なアクション空間でサンプル効率の高いオフポリシー評価 (OPE) と学習 (OPL) を行う機会が得られます。
私たちは、構造化された有益な事前分布を通じてこれらの相関関係を捉えるための統一ベイジアンフレームワークを導入します。
このフレームワークでは、アルゴリズムと理論の両方の基礎に基づいて、OPE と OPL 向けに設計された汎用ベイジアンアプローチである sDM を提案します。
特に、sDM は計算効率を損なうことなくアクションの相関関係を活用します。
さらに、オンラインベイジアンバンディットに触発されて、従来の最悪の場合の評価から逸脱して、複数の問題インスタンスにわたるアルゴリズムの平均パフォーマンスを評価するベイジアンメトリクスを導入します。
OPE と OPL の sDM を分析し、アクションの相関関係を活用する利点を強調します。
経験的証拠は、sDM の強力なパフォーマンスを示しています。

要約(オリジナル)

In interactive systems, actions are often correlated, presenting an opportunity for more sample-efficient off-policy evaluation (OPE) and learning (OPL) in large action spaces. We introduce a unified Bayesian framework to capture these correlations through structured and informative priors. In this framework, we propose sDM, a generic Bayesian approach designed for OPE and OPL, grounded in both algorithmic and theoretical foundations. Notably, sDM leverages action correlations without compromising computational efficiency. Moreover, inspired by online Bayesian bandits, we introduce Bayesian metrics that assess the average performance of algorithms across multiple problem instances, deviating from the conventional worst-case assessments. We analyze sDM in OPE and OPL, highlighting the benefits of leveraging action correlations. Empirical evidence showcases the strong performance of sDM.

arxiv情報

著者	Imad Aouali,Victor-Emmanuel Brunel,David Rohde,Anna Korba
発行日	2024-02-22 16:09:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Bayesian Off-Policy Evaluation and Learning for Large Action Spaces

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー