OPeRA: A Dataset of Observation, Persona, Rationale, and Action for Evaluating LLMs on Human Online Shopping Behavior Simulation

要約

大規模な言語モデル（LLMS）は、特定のユーザーの次のWebアクションを正確にシミュレートできますか？
LLMは「信じられる」人間の行動を生成する際に有望な能力を示していますが、主に、観察可能なアクションと実際の人間ユーザーの内部推論の両方をキャプチャする高品質で公開されているデータセットがないため、実際のユーザーの行動を模倣する能力を評価する能力を評価することは、オープンな課題のままです。
このギャップに対処するために、オンラインショッピングセッション中に実際の人間の参加者から収集された観察、ペルソナ、根拠、およびアクションの新しいデータセットであるオペラを紹介します。
Operaは、ユーザーのペルソナ、ブラウザ観測、細かいWebアクション、および自己報告されたジャストインタイムの理論的根拠など、包括的にキャプチャする最初のパブリックデータセットです。
オンラインアンケートとカスタムブラウザプラグインの両方を開発して、このデータセットを高い忠実度で収集しました。
Operaを使用して、最初のベンチマークを確立して、現在のLLMが特定のユーザーの次のアクションと理論的根拠を特定のペルソナと<観察、アクション、根拠>履歴を使用して評価できるかを評価します。
このデータセットは、人間のパーソナライズされたデジタル双子として機能することを目的としたLLMエージェントの将来の研究の基礎を築きます。

要約(オリジナル)

Can large language models (LLMs) accurately simulate the next web action of a specific user? While LLMs have shown promising capabilities in generating “believable” human behaviors, evaluating their ability to mimic real user behaviors remains an open challenge, largely due to the lack of high-quality, publicly available datasets that capture both the observable actions and the internal reasoning of an actual human user. To address this gap, we introduce OPERA, a novel dataset of Observation, Persona, Rationale, and Action collected from real human participants during online shopping sessions. OPERA is the first public dataset that comprehensively captures: user personas, browser observations, fine-grained web actions, and self-reported just-in-time rationales. We developed both an online questionnaire and a custom browser plugin to gather this dataset with high fidelity. Using OPERA, we establish the first benchmark to evaluate how well current LLMs can predict a specific user’s next action and rationale with a given persona and history. This dataset lays the groundwork for future research into LLM agents that aim to act as personalized digital twins for human.

arxiv情報

著者	Ziyi Wang,Yuxuan Lu,Wenbo Li,Amirali Amini,Bo Sun,Yakov Bart,Weimin Lyu,Jiri Gesi,Tian Wang,Jing Huang,Yu Su,Upol Ehsan,Malihe Alikhani,Toby Jia-Jun Li,Lydia Chilton,Dakuo Wang
発行日	2025-06-16 17:32:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

OPeRA: A Dataset of Observation, Persona, Rationale, and Action for Evaluating LLMs on Human Online Shopping Behavior Simulation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー