MiMo: Unlocking the Reasoning Potential of Language Model — From Pretraining to Posttraining

要約

推論のために生まれた大規模な言語モデルであるMIMO-7Bを提示し、トレーニング前とトレーニング後の両方の段階で最適化します。
トレーニング前に、データの前処理パイプラインを強化し、3段階のデータミキシング戦略を採用して、基本モデルの推論の可能性を強化します。
MIMO-7Bベースは、25兆トークンで事前に訓練されており、パフォーマンスの向上と加速推論速度のための追加のマルチトークン予測目標があります。
トレーニング後に、130kの検証可能な数学とプログラミングの問題のデータセットを補強学習のためのプログラミングの問題をキュレートし、テスト決定駆動型のコード報酬制度を統合して、まばらな報酬問題を軽減し、トレーニングを安定させるための戦略的データのリサンプリングを採用します。
広範な評価によると、MIMO-7Bベースは例外的な推論の可能性を備えており、さらに大きな32Bモデルを上回っています。
最終的なRLチューニングモデルであるMIMO-7B-RLは、数学、コード、および一般的な推論タスクで優れたパフォーマンスを実現し、Openai O1-Miniのパフォーマンスを超えています。
モデルチェックポイントは、https：//github.com/xiaomimimo/mimoで入手できます。

要約(オリジナル)

We present MiMo-7B, a large language model born for reasoning tasks, with optimization across both pre-training and post-training stages. During pre-training, we enhance the data preprocessing pipeline and employ a three-stage data mixing strategy to strengthen the base model’s reasoning potential. MiMo-7B-Base is pre-trained on 25 trillion tokens, with additional Multi-Token Prediction objective for enhanced performance and accelerated inference speed. During post-training, we curate a dataset of 130K verifiable mathematics and programming problems for reinforcement learning, integrating a test-difficulty-driven code-reward scheme to alleviate sparse-reward issues and employing strategic data resampling to stabilize training. Extensive evaluations show that MiMo-7B-Base possesses exceptional reasoning potential, outperforming even much larger 32B models. The final RL-tuned model, MiMo-7B-RL, achieves superior performance on mathematics, code and general reasoning tasks, surpassing the performance of OpenAI o1-mini. The model checkpoints are available at https://github.com/xiaomimimo/MiMo.

arxiv情報

著者	Xiaomi LLM-Core Team,:,Bingquan Xia,Bowen Shen,Cici,Dawei Zhu,Di Zhang,Gang Wang,Hailin Zhang,Huaqiu Liu,Jiebao Xiao,Jinhao Dong,Liang Zhao,Peidian Li,Peng Wang,Shihua Yu,Shimao Chen,Weikun Wang,Wenhan Ma,Xiangwei Deng,Yi Huang,Yifan Song,Zihan Jiang,Bowen Ye,Can Cai,Chenhong He,Dong Zhang,Duo Zhang,Guoan Wang,Hao Tian,Haochen Zhao,Heng Qu,Hongshen Xu,Jun Shi,Kainan Bao,QingKai Fang,Kang Zhou,Kangyang Zhou,Lei Li,Menghang Zhu,Nuo Chen,Qiantong Wang,Shaohui Liu,Shicheng Li,Shuhao Gu,Shuhuai Ren,Shuo Liu,Sirui Deng,Weiji Zhuang,Weiwei Lv,Wenyu Yang,Xin Zhang,Xing Yong,Xing Zhang,Xingchen Song,Xinzhe Xu,Xu Wang,Yihan Yan,Yu Tu,Yuanyuan Tian,Yudong Wang,Yue Yu,Zhenru Lin,Zhichao Song,Zihao Yue
発行日	2025-05-12 14:30:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MiMo: Unlocking the Reasoning Potential of Language Model — From Pretraining to Posttraining

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー