Ling-V2 – A large language model series launched by Ant Bailian

What is Ling-V2?

Ling-V2 is a family of large language models launched by Ant Bailian, built on a Mixture-of-Experts (MoE) architecture. Its first version, Ling-mini-2.0, has 16 billion total parameters, with only 1.4 billion parameters activated per input token. The model is trained on 20 trillion high-quality tokens, enhanced with multi-stage supervised fine-tuning and reinforcement learning, achieving outstanding performance in complex reasoning and instruction following. With a 1/32 activation ratio MoE design, Ling-mini-2.0 delivers 7x effective dense performance scaling, enabling fast generation, high training and inference efficiency. It also open-sourced its FP8-efficient training solution and provides multiple pretraining checkpoints, supporting continual training—making it an ideal starting point for MoE research.

Main Features of Ling-V2

Powerful reasoning ability: Excels in coding, mathematics, and knowledge-intensive cross-domain reasoning tasks, outperforming sub-billion dense models and even larger-scale MoE models.
High efficiency: Uses a 1/32 activation ratio MoE design, delivering 7x equivalent dense performance. With 1.4B active parameters, it performs on par with a 7–8B dense model. In simple Q&A scenarios, generation speed reaches 300+ tokens/s, and when processing 128K context length, relative speed improves by more than 7x.
Efficient training solution: Employs FP8 mixed-precision training throughout. The team open-sourced their FP8 training solution, leveraging tile/blockwise FP8 scaling and introducing FP8 optimizers for extreme memory optimization. On 8/16/32 × 80G GPUs, training throughput significantly outperforms LLaMA 3.1 8B and Qwen3 8B.
Open-source strategy: In addition to trained versions, five pretraining checkpoints are released, enabling deeper research and broader applications.

Technical Principles of Ling-V2

MoE architecture: Built on a Mixture-of-Experts structure, decomposing the model into multiple expert networks. For each input token, only a subset of experts is activated, achieving sparsity while maintaining high performance and computational efficiency.
Optimization design: Includes empirical optimizations such as expert granularity, shared expert ratios, attention ratios, no auxiliary loss + sigmoid routing strategy, MTP loss, QK-Norm, and semi-RoPE—further enhancing performance and efficiency.
FP8 mixed-precision training: Uses FP8 training instead of BF16. In experiments with over 1 trillion tokens, loss curves and downstream benchmark performance are nearly identical. The open-sourced FP8 solution enables the community to perform efficient continued pretraining and fine-tuning with limited compute resources.
Multi-stage training: Trained on 20+ trillion high-quality tokens, with multi-stage supervised fine-tuning and reinforcement learning, delivering significant improvements in complex reasoning and instruction adherence.

Project Links

GitHub repository: https://github.com/inclusionAI/Ling-V2
HuggingFace model collection: https://huggingface.co/collections/inclusionAI/ling-v2-68bf1dd2fc34c306c1fa6f86

Application Scenarios of Ling-V2

Natural Language Processing (NLP) tasks: Efficiently handles text classification, sentiment analysis, machine translation, and more—delivering accurate and high-performance solutions.
Intelligent customer service: Serves as the core engine of smart customer service systems, quickly responding to queries with precise answers, enhancing user experience and service efficiency.
Content creation: Assists in generating high-quality written content such as news reports, creative writing, and advertising copy, boosting both efficiency and quality for creators.
Education: Applied in education for intelligent tutoring, automated grading, and personalized learning plans, offering tailored support for students and teachers.
Healthcare: Processes medical text data to assist doctors with case analysis, medical literature retrieval, and more—improving accuracy and efficiency in medical decision-making.