MAmmoTH2: Scaling Instructions from the Web

要約

命令チューニングにより、データ品質とスケーラビリティが重要な要素となり、大規模言語モデル (LLM) の推論能力が向上します。
ほとんどの命令チューニングデータは、人間のクラウドソーシングまたは GPT-4 蒸留から取得されます。
私たちは、LLM 推論を強化するために、事前トレーニング Web コーパスから 1,000 万の自然に存在する命令データを効率的に収集するパラダイムを提案します。
私たちのアプローチには、(1) 関連文書の呼び出し、(2) 命令と応答のペアの抽出、(3) オープンソース LLM を使用した抽出されたペアの改良が含まれます。
このデータセットでベース LLM を微調整して、推論ベンチマークのパフォーマンスを大幅に向上させる MAmmoTH2 モデルを構築します。
特に、ドメイン内データでトレーニングを行わなくても、MAmmoTH2-7B (Mistral) のパフォーマンスは MATH で 11% から 34% に、GSM8K で 36% から 67% に向上しました。
公共指導チューニングデータセットで MAmmoTH2 をさらにトレーニングすると、MAmmoTH2-Plus が生成され、いくつかの推論およびチャットボットベンチマークで最先端のパフォーマンスが達成されます。
私たちの研究は、コストのかかる人間によるアノテーションや GPT-4 蒸留を行わずに大規模で高品質の命令データを収集する方法を実証し、より優れた命令チューニングデータを構築するための新しいパラダイムを提供します。

要約(オリジナル)

Instruction tuning improves the reasoning abilities of large language models (LLMs), with data quality and scalability being the crucial factors. Most instruction tuning data come from human crowd-sourcing or GPT-4 distillation. We propose a paradigm to efficiently harvest 10 million naturally existing instruction data from the pre-training web corpus to enhance LLM reasoning. Our approach involves (1) recalling relevant documents, (2) extracting instruction-response pairs, and (3) refining the extracted pairs using open-source LLMs. Fine-tuning base LLMs on this dataset, we build MAmmoTH2 models, which significantly boost performance on reasoning benchmarks. Notably, MAmmoTH2-7B’s (Mistral) performance increases from 11% to 34% on MATH and from 36% to 67% on GSM8K without training on any in-domain data. Further training MAmmoTH2 on public instruction tuning datasets yields MAmmoTH2-Plus, achieving state-of-the-art performance on several reasoning and chatbot benchmarks. Our work demonstrates how to harvest large-scale, high-quality instruction data without costly human annotation or GPT-4 distillation, providing a new paradigm for building better instruction tuning data.

arxiv情報

著者	Xiang Yue,Tuney Zheng,Ge Zhang,Wenhu Chen
発行日	2024-05-15 15:37:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MAmmoTH2: Scaling Instructions from the Web

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー