Leveraging heterogeneous spillover in maximizing contextual bandit rewards

要約

コンテキストに依存する推奨システムは、コンテキスト情報を考慮して、関連するアイテムの推奨事項を継続的に改善します。
Bandit Algorithmsの目的は、各ユーザーに最適なアーム（推奨するのに最適なアイテムなど）を学習し、ユーザーエンゲージメントからの累積的な報酬を推奨事項と最大化することです。
これらのアルゴリズムが通常考慮するコンテキストは、ユーザーとアイテムの属性です。
ただし、$ \ textIT {1人のユーザーのアクションが他のユーザーのアクションと報酬に影響を与える可能性があるソーシャルネットワークのコンテキストでは、} $のアクションも予測力だけでなく、予測力を持つことができるため、非常に重要なコンテキストでもあります。
Spilloverを通じて将来の報酬に影響を与える可能性があります。
さらに、影響の影響は、スピルオーバー効果の不均一性につながる他のユーザーとの関係の近さと、さまざまな人にとって異なる場合があります。
ここでは、各ユーザーに最適なアームを選択する際に、文脈上のマルチアームの盗賊がそのような不均一な波及効果を説明できるようにするフレームワークを提示します。
いくつかの半合成および実世界のデータセットに関する実験は、私たちのフレームワークが、ネットワーク情報と潜在的な波及を無視する既存の最先端のソリューションよりも大幅に高い報酬につながることを示しています。

要約(オリジナル)

Recommender systems relying on contextual multi-armed bandits continuously improve relevant item recommendations by taking into account the contextual information. The objective of bandit algorithms is to learn the best arm (e.g., best item to recommend) for each user and thus maximize the cumulative rewards from user engagement with the recommendations. The context that these algorithms typically consider are the user and item attributes. However, in the context of social networks where $\textit{the action of one user can influence the actions and rewards of other users,}$ neighbors’ actions are also a very important context, as they can have not only predictive power but also can impact future rewards through spillover. Moreover, influence susceptibility can vary for different people based on their preferences and the closeness of ties to other users which leads to heterogeneity in the spillover effects. Here, we present a framework that allows contextual multi-armed bandits to account for such heterogeneous spillovers when choosing the best arm for each user. Our experiments on several semi-synthetic and real-world datasets show that our framework leads to significantly higher rewards than existing state-of-the-art solutions that ignore the network information and potential spillover.

arxiv情報

著者	Ahmed Sayeed Faruk,Elena Zheleva
発行日	2025-01-24 18:30:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Leveraging heterogeneous spillover in maximizing contextual bandit rewards

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー