RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs

要約

LLM を利用したパーソナライゼーションエージェントシステムは、ラージ言語モデル (LLM) を採用して、ユーザーの過去のアクティビティからユーザーの行動を予測します。
ただし、その有効性は、多くの場合、データに固有のノイズと長さがあるため、広範で長いユーザー履歴データを効果的に活用できるかどうかにかかっています。
既存の事前トレーニング済み LLM は、簡潔ではあるものの、下流のタスクに必要なコンテキストを欠いた要約を生成する可能性があり、パーソナライゼーションシステムでの有用性を妨げています。
これらの課題に対処するために、予測フィードバックからの強化学習 (RLPF) を導入します。
RLPF は LLM を微調整して、下流のタスクのパフォーマンスに最適化された、人間が判読できる簡潔なユーザー概要を生成します。
RLPF は、生成された概要の有用性を最大限に活用することで、下流のタスクに必要な重要な情報を保存しながら、広範なユーザー履歴データを効果的に抽出します。
当社の経験的評価では、外部の下流タスクのユーティリティと本質的な要約品質の両方が大幅に向上し、下流タスクのパフォーマンスでベースライン手法を最大 22% 上回り、事実性、抽象性、可読性で最大 84.59% の勝率を達成していることが実証されています。
また、RLPF は、コンテキストの長さの 74% という驚くべき削減を達成しながら、19 個の未確認のタスクおよび/またはデータセットのうち 16 個のパフォーマンスを向上させ、その汎用性を示しています。
このアプローチは、長くノイズの多いユーザー履歴を有益で人間が判読できる表現に効果的に変換することで、LLM のパーソナライゼーションを強化するための有望なソリューションを提供します。

要約(オリジナル)

LLM-powered personalization agent systems employ Large Language Models (LLMs) to predict users’ behavior from their past activities. However, their effectiveness often hinges on the ability to effectively leverage extensive, long user historical data due to its inherent noise and length of such data. Existing pretrained LLMs may generate summaries that are concise but lack the necessary context for downstream tasks, hindering their utility in personalization systems. To address these challenges, we introduce Reinforcement Learning from Prediction Feedback (RLPF). RLPF fine-tunes LLMs to generate concise, human-readable user summaries that are optimized for downstream task performance. By maximizing the usefulness of the generated summaries, RLPF effectively distills extensive user history data while preserving essential information for downstream tasks. Our empirical evaluation demonstrates significant improvements in both extrinsic downstream task utility and intrinsic summary quality, surpassing baseline methods by up to 22% on downstream task performance and achieving an up to 84.59% win rate on Factuality, Abstractiveness, and Readability. RLPF also achieves a remarkable 74% reduction in context length while improving performance on 16 out of 19 unseen tasks and/or datasets, showcasing its generalizability. This approach offers a promising solution for enhancing LLM personalization by effectively transforming long, noisy user histories into informative and human-readable representations.

arxiv情報

著者	Jiaxing Wu,Lin Ning,Luyang Liu,Harrison Lee,Neo Wu,Chao Wang,Sushant Prakash,Shawn O’Banion,Bradley Green,Jun Xie
発行日	2024-09-06 17:30:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー