Exploring the Impact of Instruction Data Scaling on Large Language Models: An Empirical Study on Real-World Use Cases

要約

ChatGPT の成功は、最近、それを再現するための多くの取り組みを引き付けており、命令チューニング戦略が注目すべき結果を達成するための重要な要素となっています。
命令チューニングは、モデルのパフォーマンスと一般化を大幅に向上させるだけでなく、モデルの生成結果を人間の発話パターンとより一致させます。
ただし、現在の研究では、特に実際の使用事例において、さまざまな量の命令データがモデルのパフォーマンスに与える影響を研究することはほとんどありません。
このホワイトペーパーでは、さまざまなスケールの命令データにまたがる命令チューニングに基づいて、大規模な言語モデルのパフォーマンスを調べます。
実験では、12 の主要なオンラインユースケースで構成される評価データセットが構築されます。
Bloomz-7B1-mt をベースモデルとして、1) 命令データの量を増やすだけで、オープンエンド生成などのタスクが継続的に改善される、2) 数学やコードなどのタスクでモデルのパフォーマンスが向上する、という結果が得られました。
データサイズを増やしても、曲線は非常にフラットなままです。
これらの現象の考えられる原因をさらに分析し、高品質のトレーニングデータの効果的な選択、基本モデルのスケーリング、ハードタスクに特化したトレーニング方法など、将来の潜在的な研究の方向性を提案します。
トレーニングデータセットと評価データセット、およびモデルチェックポイントをリリースします。

要約(オリジナル)

The success of ChatGPT has recently attracted numerous efforts to replicate it, with instruction-tuning strategies being a key factor in achieving remarkable results. Instruction-tuning not only significantly enhances the model’s performance and generalization but also makes the model’s generated results more consistent with human speech patterns. However current research rarely studies the impact of different amounts of instruction data on model performance, especially in the real-world use cases. In this paper we explore the performance of large language models based on instruction tuning across different scales of instruction data. An evaluation dataset consisting of 12 major online use cases is constructed in the experiment. With Bloomz-7B1-mt as the base model, the results show that 1) merely increasing the amount of instruction data leads to continuous improvement in tasks such as open-ended generation, 2) in tasks such as math and code, the model performance curve remains quite flat while increasing data size. We further analyze the possible causes of these phenomena and propose potential future research directions such as effectively selecting high-quality training data, scaling base models and training methods specialized for hard tasks. We will release our training and evaluation datasets, as well as model checkpoints.

arxiv情報

著者	Yunjie Ji,Yong Deng,Yan Gong,Yiping Peng,Qiang Niu,Lei Zhang,Baochang Ma,Xiangang Li
発行日	2023-03-26 14:49:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Exploring the Impact of Instruction Data Scaling on Large Language Models: An Empirical Study on Real-World Use Cases

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー