Automatic Music Playlist Generation via Simulation-based Reinforcement Learning

要約

プレイリストのパーソナライズは音楽ストリーミングサービスでは一般的な機能ですが、協調フィルタリングなどの従来の技術は、コンテンツの品質に関する明示的な仮定に基づいて推奨方法を学習します。
このような仮定により、オフラインのモデルの目標とオンラインのユーザー満足度の指標の間に不整合が生じることがよくあります。
この論文では、シミュレートされたプレイリスト生成環境を使用してユーザー満足度メトリクスを直接最適化することで、そのような制限を解決する強化学習フレームワークを紹介します。
このシミュレーターを使用して、RL 定式化の大規模な状態とアクション空間によって課せられる課題に対処する方法で、修正された Deep Q ネットワークであるアクションヘッド DQN (AH-DQN) を開発およびトレーニングします。
結果として得られるポリシーは、消費指標を最大化することを期待して、大規模で動的な候補アイテムのセットから推奨を行うことができます。
パブリックおよび独自のストリーミングデータセットの両方でトレーニングされた環境モデルを使用するシミュレーションを通じて、エージェントをオフラインで分析および評価します。
これらのエージェントが、オンライン A/B テスト中にベースライン方法と比較してどのように優れたユーザー満足度指標につながるかを示します。
最後に、シミュレータから生成されたパフォーマンス評価が、観察されたオンラインメトリクスの結果と強く相関していることを示します。

要約(オリジナル)

Personalization of playlists is a common feature in music streaming services, but conventional techniques, such as collaborative filtering, rely on explicit assumptions regarding content quality to learn how to make recommendations. Such assumptions often result in misalignment between offline model objectives and online user satisfaction metrics. In this paper, we present a reinforcement learning framework that solves for such limitations by directly optimizing for user satisfaction metrics via the use of a simulated playlist-generation environment. Using this simulator we develop and train a modified Deep Q-Network, the action head DQN (AH-DQN), in a manner that addresses the challenges imposed by the large state and action space of our RL formulation. The resulting policy is capable of making recommendations from large and dynamic sets of candidate items with the expectation of maximizing consumption metrics. We analyze and evaluate agents offline via simulations that use environment models trained on both public and proprietary streaming datasets. We show how these agents lead to better user-satisfaction metrics compared to baseline methods during online A/B tests. Finally, we demonstrate that performance assessments produced from our simulator are strongly correlated with observed online metric results.

arxiv情報

著者	Federico Tomasi,Joseph Cauteruccio,Surya Kanoria,Kamil Ciosek,Matteo Rinaldi,Zhenwen Dai
発行日	2023-10-13 14:13:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Automatic Music Playlist Generation via Simulation-based Reinforcement Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー