Xmodel-2 Technical Report

要約

Xmodel-2 は、推論タスク専用に設計された 12 億パラメータの大規模言語モデルです。
そのアーキテクチャにより、さまざまなモデルスケールが統合されたハイパーパラメーターセットを共有できるようになり、小規模なモデルで広範な実験を行ったり、最適な構成を大規模なモデルにシームレスに移行したりすることが可能になります。
トレーニングの効率と安定性を最大化するために、Xmodel-2 は MiniCPM の WSD 学習率スケジューラを採用しています。
さまざまなソースからの 1 兆 5,000 億のトークンで事前トレーニングされた Xmodel-2 は、トレーニングコストを低く抑えながら、複雑な推論とエージェントベースのタスクで最先端のパフォーマンスを実現します。
これらの結果は、推論能力を向上させるための効率的なモデル設計とトレーニング戦略の可能性を強調しています。
モデルのチェックポイントとコードは、GitHub (https://github.com/XiaoduoAILab/Xmodel-2) で公開されています。

要約(オリジナル)

Xmodel-2 is a 1.2-billion-parameter large language model designed specifically for reasoning tasks. Its architecture enables different model scales to share a unified set of hyperparameters, allowing for extensive experimentation on smaller models and seamless transfer of optimal configurations to larger models. To maximize training efficiency and stability, Xmodel-2 employs the WSD learning rate scheduler from MiniCPM. Pretrained on 1.5 trillion tokens from diverse sources, Xmodel-2 achieves state-of-the-art performance in complex reasoning and agent-based tasks, while maintaining low training costs. These results highlight the potential of efficient model design and training strategies in advancing reasoning capabilities. Model checkpoints and code are publicly available on GitHub at https://github.com/XiaoduoAILab/Xmodel-2

arxiv情報

著者	Wang Qun,Liu Yang,Lin Qingquan,Qu Zhijiu,Jiang Ling
発行日	2024-12-27 13:32:10+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Xmodel-2 Technical Report

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー