Truthful mechanisms for linear bandit games with private contexts

要約

エージェントが個人的なコンテキストで順番に到着し、システムがそれに応じてその腕の割り当ての決定を適応させる文脈的盗賊問題は、最近、よりパーソナライズされた結果を可能にするための注意を高めています。
ただし、多くのヘルスケアおよび推奨アプリケーションでは、エージェントにはプライベートプロファイルがあり、システムから得られるコンテキストを誤って宣言する可能性があります。
たとえば、病院がボランティアを順次募集して複数の新しい治療法をテストし、症状や暫定データなどのボランティアの報告されたプロファイルに基づいて計画を調整する適応臨床試験では、参加者はアレルギーや吐き気などの深刻な副作用を誤った報告して、気候亜最適治療を避けることができます。
私たちは、システムと非繰り返しのエージェントの間の確率的文脈的盗賊ゲームで誤って報告するプライベートコンテキストのこの問題を研究した最初の人物です。
UCBファミリーアルゴリズムやトンプソンサンプリングなどの従来の低リグレットアルゴリズムは、真実の報告を確保できず、最悪の場合に直線的な後悔をもたらす可能性があることを示しています。
トンプソンのサンプリングからの逸脱を最小限に抑えながら真実性を確保するために線形プログラムを使用するメカニズムを提案します。
私たちの数値実験は、さらに複数のコンテキストや他の流通ファミリ全体で強力なパフォーマンスを示しています。

要約(オリジナル)

The contextual bandit problem, where agents arrive sequentially with personal contexts and the system adapts its arm allocation decisions accordingly, has recently garnered increasing attention for enabling more personalized outcomes. However, in many healthcare and recommendation applications, agents have private profiles and may misreport their contexts to gain from the system. For example, in adaptive clinical trials, where hospitals sequentially recruit volunteers to test multiple new treatments and adjust plans based on volunteers’ reported profiles such as symptoms and interim data, participants may misreport severe side effects like allergy and nausea to avoid perceived suboptimal treatments. We are the first to study this issue of private context misreporting in a stochastic contextual bandit game between the system and non-repeated agents. We show that traditional low-regret algorithms, such as UCB family algorithms and Thompson sampling, fail to ensure truthful reporting and can result in linear regret in the worst case, while traditional truthful algorithms like explore-then-commit (ETC) and $\epsilon$-greedy algorithm incur sublinear but high regret. We propose a mechanism that uses a linear program to ensure truthfulness while minimizing deviation from Thompson sampling, yielding an $O(\ln T)$ frequentist regret. Our numerical experiments further demonstrate strong performance in multiple contexts and across other distribution families.

arxiv情報

著者	Yiting Hu,Lingjie Duan
発行日	2025-04-23 16:57:17+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Truthful mechanisms for linear bandit games with private contexts

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー