UserSumBench: A Benchmark Framework for Evaluating User Summarization Approaches

要約

大規模言語モデル (LLM) は、生のユーザーアクティビティデータの長いリストからユーザーの概要を生成する際に優れた機能を示しています。
これらの概要は、好みや興味などの重要なユーザー情報を取得するため、説明可能なレコメンダーシステムなどの LLM ベースのパーソナライゼーションアプリケーションにとって非常に貴重です。
しかし、新しい要約技術の開発は、真実のラベルの欠如、ユーザーの要約に固有の主観性、およびコストと時間がかかることが多い人間による評価によって妨げられています。
これらの課題に対処するために、LLM ベースの要約アプローチの反復開発を促進するように設計されたベンチマークフレームワークである \UserSumBench を導入します。
このフレームワークは 2 つの主要なコンポーネントを提供します。 (1) 参照不要の要約品質指標。
私たちは、この指標が効果的であり、3 つの多様なデータセット (MovieLens、Yelp、Amazon Review) にわたって人間の好みと一致していることを示します。
(2) 時間階層的な要約機能と自己批判検証機能を活用して、幻覚を排除しながら高品質の要約を生成する、新しい堅牢な要約方法。
この方法は、要約技術のさらなる革新のための強力なベースラインとして機能します。

要約(オリジナル)

Large language models (LLMs) have shown remarkable capabilities in generating user summaries from a long list of raw user activity data. These summaries capture essential user information such as preferences and interests, and therefore are invaluable for LLM-based personalization applications, such as explainable recommender systems. However, the development of new summarization techniques is hindered by the lack of ground-truth labels, the inherent subjectivity of user summaries, and human evaluation which is often costly and time-consuming. To address these challenges, we introduce \UserSumBench, a benchmark framework designed to facilitate iterative development of LLM-based summarization approaches. This framework offers two key components: (1) A reference-free summary quality metric. We show that this metric is effective and aligned with human preferences across three diverse datasets (MovieLens, Yelp and Amazon Review). (2) A novel robust summarization method that leverages time-hierarchical summarizer and self-critique verifier to produce high-quality summaries while eliminating hallucination. This method serves as a strong baseline for further innovation in summarization techniques.

arxiv情報

著者	Chao Wang,Neo Wu,Lin Ning,Luyang Liu,Jun Xie,Shawn O’Banion,Bradley Green
発行日	2024-08-30 01:56:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

UserSumBench: A Benchmark Framework for Evaluating User Summarization Approaches

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー