Rewarding Chatbots for Real-World Engagement with Millions of Users

要約

事前トレーニングされた大規模な言語モデルの出現により、雑談用のさまざまなソーシャルチャットボットが展開されました。
これらのチャットボットは言語能力と流暢さを示しますが、魅力的であることが保証されておらず、ユーザーを維持するのに苦労する可能性があります.
この作業では、ユーザーエンゲージメントを優先してリテンションを強化するソーシャルチャットボットの開発を調査します。特に、人間のフィードバックを使用して非常に魅力的なチャットボットを効率的に開発する方法を調べます。
提案されたアプローチは、ユーザーインタラクションから収集された自動疑似ラベルを使用して、推論時にチャットボットモデルによって生成された低スコアのサンプル応答を拒否するために使用できる報酬モデルをトレーニングします。
平均会話長 (MCL) などの直感的な評価指標は、展開されたチャットボットのエンゲージメントのレベルを測定するためのプロキシとして導入されています。
Chai Research プラットフォームで毎日 10,000 人の新しいチャットボットユーザーのグループを対象に A/B テストを行ったところ、このアプローチによって MCL が最大 70% 増加し、GPT-J 6B モデルのユーザー維持率が 30% 以上増加することがわかりました。
今後の作業では、報酬モデルを使用してデータフライホイールを実現することを目指しています。そこでは、最新のユーザーの会話を使用して、言語モデルと報酬モデルを交互に微調整できます。

要約(オリジナル)

The emergence of pretrained large language models has led to the deployment of a range of social chatbots for chitchat. Although these chatbots demonstrate language ability and fluency, they are not guaranteed to be engaging and can struggle to retain users. This work investigates the development of social chatbots that prioritize user engagement to enhance retention, specifically examining the use of human feedback to efficiently develop highly engaging chatbots. The proposed approach uses automatic pseudo-labels collected from user interactions to train a reward model that can be used to reject low-scoring sample responses generated by the chatbot model at inference time. Intuitive evaluation metrics, such as mean conversation length (MCL), are introduced as proxies to measure the level of engagement of deployed chatbots. A/B testing on groups of 10,000 new daily chatbot users on the Chai Research platform shows that this approach increases the MCL by up to 70%, which translates to a more than 30% increase in user retention for a GPT-J 6B model. Future work aims to use the reward model to realise a data fly-wheel, where the latest user conversations can be used to alternately fine-tune the language model and the reward model.

arxiv情報

著者	Robert Irvine,Douglas Boubert,Vyas Raina,Adian Liusie,Vineet Mudupalli,Aliaksei Korshuk,Zongyi Liu,Fritz Cremer,Valentin Assassi,Christie-Carol Beauchamp,Xiaoding Lu,Thomas Rialan,William Beauchamp
発行日	2023-03-10 18:53:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Rewarding Chatbots for Real-World Engagement with Millions of Users

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー