Xiaomi MiMo – Xiaomi’s first open-source inference large model

What is Xiaomi MiMo?

Xiaomi MiMo is Xiaomi’s first open-source reasoning large language model (LLM), designed to enhance performance in complex reasoning tasks. The model adopts a joint pretraining and post-training approach, leveraging large-scale reasoning-rich corpora and innovative reinforcement learning algorithms to significantly boost its capabilities in mathematical reasoning and code generation.

Despite having only 7 billion parameters, MiMo surpasses larger models such as OpenAI’s o1-mini and Alibaba’s Qwen QwQ-32B-Preview on public evaluation benchmarks.

Xiaomi MiMo includes four model variants, all open-sourced on Hugging Face:

MiMo-7B-Base (Pretrained model)
MiMo-7B-SFT (Supervised fine-tuned model)
MiMo-7B-RL (Reinforcement learning model)
MiMo-7B-RL-Zero (Zero-shot reinforcement learning model)

These models provide developers with powerful tools for advanced reasoning tasks.

Xiaomi MiMo – Xiaomi's first open-source inference large model

Key Features of Xiaomi MiMo

Powerful Mathematical Reasoning
Solves complex math problems with accurate reasoning paths and solutions.
Efficient Code Generation
Produces high-quality code suitable for various programming tasks.
Optimized Reasoning Performance
Achieves high reasoning efficiency through joint pretraining and post-training, outperforming larger models with just 7B parameters.

Technical Principles of Xiaomi MiMo

Pretraining Stage
- Focuses on mining reasoning-rich corpora.
- Synthesizes around 200 billion tokens of reasoning data.
- Trains using a three-phase curriculum learning strategy with a total of 25 trillion tokens, gradually increasing task difficulty to improve model capability.
Post-Training Stage
- Reinforcement Learning Algorithm: Introduces the Test Difficulty Driven Reward algorithm to mitigate sparse reward issues in hard tasks, improving performance on complex problems.
- Data Resampling Strategy: Implements an Easy Data Re-Sampling strategy to stabilize the RL training process.
- Efficient Training Framework: Designs a Seamless Rollout system to accelerate RL training (by 2.29×) and evaluation (by 1.96×), improving training efficiency.
Model Architecture Optimization
The model is tailored for reasoning tasks, ensuring high performance with a compact parameter size.

Xiaomi MiMo Project Resources

GitHub Repository: https://github.com/XiaomiMiMo
Hugging Face Model Hub: https://huggingface.co/XiaomiMiMo
Technical Report: MiMo-7B Technical Report (PDF)

Application Scenarios of Xiaomi MiMo

Education: Assists with math problem-solving and programming learning, offering solution steps and code examples.
Scientific Research: Supports logical reasoning and algorithm development, helping verify hypotheses and design experiments.
Software Development: Generates and optimizes code, assists with debugging and problem-solving.
Intelligent Customer Service: Answers complex queries and improves the efficiency of Q&A systems.
Gaming and Entertainment: Provides strategy suggestions and puzzle-solving, enhancing the fun of games.