Large Language Model as a User Simulator

要約

Vicuna が証明しているように、クローズドソースの ChatGPT の比類のないパフォーマンスは、その民主化に向けた取り組みを引き起こし、実際のユーザーと ChatGPT の会話を活用することで顕著な進歩を遂げました。
ただし、Baize や UltraChat などの現在の取り組みは、人間の参加を集めることが難しいため、会話データを自動生成することを目的としていますが、主に ChatGPT に依存して、人間の真の学習ではなく指示に基づいて人間の行動をシミュレートします。
その結果、範囲が制限され、多様性が減少し、真の複数ラウンドの会話ダイナミクスが欠如します。
上記の問題に対処するために、私たちは、本物の人間と機械の会話から抽出された人間の質問を学習目標として革新的にターゲットにし、ユーザーシミュレーターである UserGPT をトレーニングして、高品質の人間中心の合成会話データセットである RealChat を生成します。
その後、このデータセットはアシスタントモデルである ReaLM をトレーニングします。
実験的には、同等のトレーニングセットサイズを考慮した場合、ペアごとの比較により、Vicuna-Bench と MT-Bench の両方で RealLM がベースラインモデルを上回っています。また、手動評価でも、モデルが非常に競争力があることが示されています。
印象的なことに、最新の LLaMA 2 モデルで微調整した場合、ReaLM は MT ベンチで 6.33 というトップスコアを確保し、LLaMA-2-7B チャットモデルを含む現代の同スケールモデルを上回りました。
さらに詳細な分析により、私たちのアプローチの拡張性と移行可能性が実証されています。
トレーニングセットのデータ品質と結果として得られるモデルのパフォーマンスの間の相互作用についての予備調査も行われ、将来の調査のための強固な基礎が築かれます。
コードは https://github.com/FreedomIntelligence/ReaLM で入手できます。

要約(オリジナル)

The unparalleled performance of closed-sourced ChatGPT has sparked efforts towards its democratization, with notable strides made by leveraging real user and ChatGPT conversations, as evidenced by Vicuna. However, while current endeavors like Baize and UltraChat aim to auto-generate conversational data due to challenges in gathering human participation, they primarily rely on ChatGPT to simulate human behaviors based on directives rather than genuine human learning. This results in a limited scope, diminished diversity, and an absence of genuine multi-round conversational dynamics. To address the above issues, we innovatively target human questions extracted from genuine human-machine conversations as a learning goal and train a user simulator, UserGPT, to produce a high-quality human-centric synthetic conversation dataset, RealChat. Subsequently, this dataset trains our assistant model, ReaLM. Experimentally, ReaLM outpaces baseline models in both Vicuna-Bench and MT-Bench by pairwise comparison when considering equivalent training set sizes, and manual evaluation also shows that our model is highly competitive. Impressively, when fine-tuned with the latest LLaMA 2 model, ReaLM secured a leading score of 6.33 in the MT-Bench, outshining the contemporary same-scale models, including the LLaMA-2-7B-chat model. Further in-depth analysis demonstrates the scalability and transferability of our approach. A preliminary exploration into the interplay between training set data quality and resultant model performance is also undertaken, laying a robust groundwork for future investigations. The code is available at https://github.com/FreedomIntelligence/ReaLM.

arxiv情報

著者	Chuyi Kong,Yaxin Fan,Xiang Wan,Feng Jiang,Benyou Wang
発行日	2023-08-23 14:33:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Large Language Model as a User Simulator

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー