PlatoLM: Teaching LLMs via a Socratic Questioning User Simulator

要約

Vicuna が証明しているように、クローズドソースの ChatGPT の比類のないパフォーマンスは、その民主化に向けた取り組みを引き起こし、実際のユーザーと ChatGPT の会話を活用することで顕著な進歩を遂げました。
ただし、人間の参加を伴う会話を収集することには課題があるため、Baize や UltraChat などの現在の取り組みは、会話データを自動的に生成することを目的としています。
彼らは主に、人間からの真の学習ではなく、指示に基づいて人間の行動をシミュレートするロールプレイを実行する ChatGPT に依存しており、その結果、範囲が制限され、多様性が減少し、真の複数ラウンドの会話ダイナミクスが欠如します。
上記の問題に対処するために、私たちは本物の人間と機械の会話から抽出された人間の質問を学習目標としてターゲットにし、「Socratic」と呼ばれるユーザーシミュレーターをトレーニングして高品質の人間中心の合成会話データセットを生成します。
その後、このデータセットは、「PlatoLM」という名前のアシスタントモデルをトレーニングするために使用されました。
実験的には、PlatoLM は、同等のトレーニングセットサイズを考慮した場合、ペアごとの比較により、Vicuna-Bench と MT-Bench の両方でベースラインモデルを上回っています。また、手動評価でも、モデルが非常に競争力があることが示されています。
印象的なことに、最新の LLaMA 2 モデルで微調整すると、PlatoLM は MT-Bench ベンチマークおよび Alpaca-Eval ベンチマークで 7B モデル (LLaMA-2-7B-chat および Vicuna-7B を含む) の中で SOTA パフォーマンスを達成し、2 位にランクされました。
7B モデルの中でも、いくつかの大規模モデル (LLaMA-2-13B チャットや GPT-3.5 など) をも上回っています。
さらに詳細な分析により、私たちのアプローチの拡張性と移行可能性が実証されています。
コードは https://github.com/FreedomIntelligence/PlatoLM で入手できます。

要約(オリジナル)

The unparalleled performance of closed-sourced ChatGPT has sparked efforts towards its democratization, with notable strides made by leveraging real user and ChatGPT conversations, as evidenced by Vicuna. However, due to challenges in gathering conversations involving human participation, current endeavors like Baize and UltraChat aim to automatically generate conversational data. They primarily rely on ChatGPT conducting roleplay to simulate human behaviors based on instructions rather than genuine learning from humans, resulting in limited scope, diminished diversity, and an absence of genuine multi-round conversational dynamics. To address the above issues, we target human questions extracted from genuine human-machine conversations as a learning goal and train a user simulator called `Socratic’ to produce a high-quality human-centric synthetic conversation dataset. Subsequently, this dataset was used to train our assistant model, named `PlatoLM’. Experimentally, PlatoLM outpaces baseline models in both Vicuna-Bench and MT-Bench by pairwise comparison when considering equivalent training set sizes, and manual evaluation also shows that our model is highly competitive. Impressively, when fine-tuned with the latest LLaMA 2 model, PlatoLM achieves the SOTA performance among 7B models (including LLaMA-2-7B-chat and Vicuna-7B) in MT-Bench benchmark and in Alpaca-Eval benchmark, it ranks second among 7B models, even beating some larger scale models (including LLaMA-2-13B-chat and GPT-3.5). Further in-depth analysis demonstrates the scalability and transferability of our approach. The code is available at https://github.com/FreedomIntelligence/PlatoLM.

arxiv情報

著者	Chuyi Kong,Yaxin Fan,Xiang Wan,Feng Jiang,Benyou Wang
発行日	2023-10-09 15:39:21+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PlatoLM: Teaching LLMs via a Socratic Questioning User Simulator

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー