Distributed Multi-Task Learning for Stochastic Bandits with Context Distribution and Stage-wise Constraints

要約

不均一なエージェントとの確率的線形文脈的盗賊で、保守的な分散マルチタスク学習を提示します。
これにより、保守的な線形盗賊が分散型設定に拡張され、Mエージェントがステージワイズのパフォーマンスの制約を順守しながら、さまざまなが関連するタスクに取り組みます。
正確なコンテキストは不明であり、エージェントがコンテキスト分布のみを利用できます。これは、株式市場の予測や天気予報など、コンテキストを推測する予測メカニズムを含む多くの実用的なアプリケーションのように使用できます。
分散された上位信頼バウンド（UCB）アルゴリズム、disc-ucbを提案します。
私たちのアルゴリズムは、各ラウンド中に剪定されたアクションセットを構築して、制約が満たされるようにします。
さらに、適切に構造化された同期ステップを使用して、中央サーバーを介してエージェント間の推定値の同期された共有が含まれます。
アルゴリズムの後悔とコミュニケーションの境界を証明します。
問題は、エージェントがベースライン報酬を知らない設定に拡張します。
この設定では、修正されたアルゴリズムであるdisc-UCB2を提供し、変更されたアルゴリズムが同じ後悔と通信の範囲を達成することを示します。
合成データと実際のMovielens-100Kデータに関するアルゴリズムのパフォーマンスを経験的に検証しました。

要約(オリジナル)

We present conservative distributed multi-task learning in stochastic linear contextual bandits with heterogeneous agents. This extends conservative linear bandits to a distributed setting where M agents tackle different but related tasks while adhering to stage-wise performance constraints. The exact context is unknown, and only a context distribution is available to the agents as in many practical applications that involve a prediction mechanism to infer context, such as stock market prediction and weather forecast. We propose a distributed upper confidence bound (UCB) algorithm, DiSC-UCB. Our algorithm constructs a pruned action set during each round to ensure the constraints are met. Additionally, it includes synchronized sharing of estimates among agents via a central server using well-structured synchronization steps. We prove the regret and communication bounds on the algorithm. We extend the problem to a setting where the agents are unaware of the baseline reward. For this setting, we provide a modified algorithm, DiSC-UCB2, and we show that the modified algorithm achieves the same regret and communication bounds. We empirically validated the performance of our algorithm on synthetic data and real-world Movielens-100K data.

arxiv情報

著者	Jiabin Lin,Shana Moothedath
発行日	2025-04-28 13:42:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Distributed Multi-Task Learning for Stochastic Bandits with Context Distribution and Stage-wise Constraints

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー