What is Xiaomi MiMo?
Xiaomi MiMo is Xiaomi’s first open-source reasoning large language model (LLM), designed to enhance performance in complex reasoning tasks. The model adopts a joint pretraining and post-training approach, leveraging large-scale reasoning-rich corpora and innovative reinforcement learning algorithms to significantly boost its capabilities in mathematical reasoning and code generation.
Despite having only 7 billion parameters, MiMo surpasses larger models such as OpenAI’s o1-mini and Alibaba’s Qwen QwQ-32B-Preview on public evaluation benchmarks.
Xiaomi MiMo includes four model variants, all open-sourced on Hugging Face:
-
MiMo-7B-Base (Pretrained model)
-
MiMo-7B-SFT (Supervised fine-tuned model)
-
MiMo-7B-RL (Reinforcement learning model)
-
MiMo-7B-RL-Zero (Zero-shot reinforcement learning model)
These models provide developers with powerful tools for advanced reasoning tasks.
Key Features of Xiaomi MiMo
-
Powerful Mathematical Reasoning
Solves complex math problems with accurate reasoning paths and solutions. -
Efficient Code Generation
Produces high-quality code suitable for various programming tasks. -
Optimized Reasoning Performance
Achieves high reasoning efficiency through joint pretraining and post-training, outperforming larger models with just 7B parameters.
Technical Principles of Xiaomi MiMo
-
Pretraining Stage
-
Focuses on mining reasoning-rich corpora.
-
Synthesizes around 200 billion tokens of reasoning data.
-
Trains using a three-phase curriculum learning strategy with a total of 25 trillion tokens, gradually increasing task difficulty to improve model capability.
-
-
Post-Training Stage
-
Reinforcement Learning Algorithm: Introduces the Test Difficulty Driven Reward algorithm to mitigate sparse reward issues in hard tasks, improving performance on complex problems.
-
Data Resampling Strategy: Implements an Easy Data Re-Sampling strategy to stabilize the RL training process.
-
Efficient Training Framework: Designs a Seamless Rollout system to accelerate RL training (by 2.29×) and evaluation (by 1.96×), improving training efficiency.
-
-
Model Architecture Optimization
The model is tailored for reasoning tasks, ensuring high performance with a compact parameter size.
Xiaomi MiMo Project Resources
-
GitHub Repository: https://github.com/XiaomiMiMo
-
Hugging Face Model Hub: https://huggingface.co/XiaomiMiMo
-
Technical Report: MiMo-7B Technical Report (PDF)
Application Scenarios of Xiaomi MiMo
-
Education: Assists with math problem-solving and programming learning, offering solution steps and code examples.
-
Scientific Research: Supports logical reasoning and algorithm development, helping verify hypotheses and design experiments.
-
Software Development: Generates and optimizes code, assists with debugging and problem-solving.
-
Intelligent Customer Service: Answers complex queries and improves the efficiency of Q&A systems.
-
Gaming and Entertainment: Provides strategy suggestions and puzzle-solving, enhancing the fun of games.