The lightweight AI model “Phi-4-mini-flash-reasoning” has been released

Microsoft has launched Phi-4-mini-flash-reasoning, a lightweight AI model with 3.8 billion parameters that supports a 64K context length. It is designed to run in resource-constrained environments, suitable for single-GPU setups, edge devices, and mobile platforms. Compared to its predecessor, this model achieves a 10-fold improvement in inference speed and reduces latency by 2 to 3 times, making it especially well-suited for tasks requiring low latency and long-text generation. Its core architecture combines a hybrid decoder, gated memory units, and YOCO technology, significantly reducing computational complexity. The training data consists of 5 trillion tokens of pretraining data, and multiple fine-tuning methods have been applied to enhance reasoning capabilities. The model’s performance on mathematical reasoning benchmarks is close to or even surpasses some larger models, making it suitable for applications in edtech, edge computing, and automated evaluation.