Benchmarking LLMs in Recommendation Tasks: A Comparative Evaluation with Conventional Recommenders

要約

近年、大規模な言語モデル（LLMS）を推奨システムに統合することで、推奨品質を向上させる新しい機会が生まれました。
ただし、LLMの推奨機能を従来の推奨システムと徹底的に評価および比較するには、包括的なベンチマークが必要です。
このホワイトペーパーでは、Recabenchを紹介します。これは、さまざまなアイテム表現フォーム（一意の識別子、テキスト、セマンティック埋め込み、セマンティック識別子を含む）を体系的に調査し、2つの主要な推奨タスク、つまりクリックスルーレート予測（CTR）と連続的な推奨（SEQREC）を評価します。
当社の広範な実験では、最大17の大型モデルをカバーし、ファッション、ニュース、ビデオ、本、音楽ドメインから5つの多様なデータセットで実施されています。
私たちの調査結果は、LLMベースの勧告者が従来の推奨者よりも優れており、CTRシナリオで最大5％のAUC改善を達成し、SEQRECシナリオで最大170％NDCG@10改善を達成したことを示しています。
ただし、これらの実質的なパフォーマンスの向上は、推論効率が大幅に低下することを犠牲にして行われ、リアルタイムの推奨環境ではLLM-AS-RSパラダイムが非現実的になります。
私たちは、推奨固有のモデル加速方法など、将来の研究を促すための調査結果を目指しています。
コード、データ、構成、プラットフォームをリリースして、他の研究者が実験結果を再現して構築できるようにします。

要約(オリジナル)

In recent years, integrating large language models (LLMs) into recommender systems has created new opportunities for improving recommendation quality. However, a comprehensive benchmark is needed to thoroughly evaluate and compare the recommendation capabilities of LLMs with traditional recommender systems. In this paper, we introduce RecBench, which systematically investigates various item representation forms (including unique identifier, text, semantic embedding, and semantic identifier) and evaluates two primary recommendation tasks, i.e., click-through rate prediction (CTR) and sequential recommendation (SeqRec). Our extensive experiments cover up to 17 large models and are conducted across five diverse datasets from fashion, news, video, books, and music domains. Our findings indicate that LLM-based recommenders outperform conventional recommenders, achieving up to a 5% AUC improvement in the CTR scenario and up to a 170% NDCG@10 improvement in the SeqRec scenario. However, these substantial performance gains come at the expense of significantly reduced inference efficiency, rendering the LLM-as-RS paradigm impractical for real-time recommendation environments. We aim for our findings to inspire future research, including recommendation-specific model acceleration methods. We will release our code, data, configurations, and platform to enable other researchers to reproduce and build upon our experimental results.

arxiv情報

著者	Qijiong Liu,Jieming Zhu,Lu Fan,Kun Wang,Hengchang Hu,Wei Guo,Yong Liu,Xiao-Ming Wu
発行日	2025-03-07 15:05:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Benchmarking LLMs in Recommendation Tasks: A Comparative Evaluation with Conventional Recommenders

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー