Ling-V2 – A large language model series launched by Ant Bailian

AI Tools updated 1w ago dongdong
34 0

What is Ling-V2?

Ling-V2 is a family of large language models launched by Ant Bailian, built on a Mixture-of-Experts (MoE) architecture. Its first version, Ling-mini-2.0, has 16 billion total parameters, with only 1.4 billion parameters activated per input token. The model is trained on 20 trillion high-quality tokens, enhanced with multi-stage supervised fine-tuning and reinforcement learning, achieving outstanding performance in complex reasoning and instruction following. With a 1/32 activation ratio MoE design, Ling-mini-2.0 delivers 7x effective dense performance scaling, enabling fast generation, high training and inference efficiency. It also open-sourced its FP8-efficient training solution and provides multiple pretraining checkpoints, supporting continual training—making it an ideal starting point for MoE research.

Ling-V2 – A large language model series launched by Ant Bailian


Main Features of Ling-V2

  • Powerful reasoning ability: Excels in coding, mathematics, and knowledge-intensive cross-domain reasoning tasks, outperforming sub-billion dense models and even larger-scale MoE models.

  • High efficiency: Uses a 1/32 activation ratio MoE design, delivering 7x equivalent dense performance. With 1.4B active parameters, it performs on par with a 7–8B dense model. In simple Q&A scenarios, generation speed reaches 300+ tokens/s, and when processing 128K context length, relative speed improves by more than 7x.

  • Efficient training solution: Employs FP8 mixed-precision training throughout. The team open-sourced their FP8 training solution, leveraging tile/blockwise FP8 scaling and introducing FP8 optimizers for extreme memory optimization. On 8/16/32 × 80G GPUs, training throughput significantly outperforms LLaMA 3.1 8B and Qwen3 8B.

  • Open-source strategy: In addition to trained versions, five pretraining checkpoints are released, enabling deeper research and broader applications.


Technical Principles of Ling-V2

  • MoE architecture: Built on a Mixture-of-Experts structure, decomposing the model into multiple expert networks. For each input token, only a subset of experts is activated, achieving sparsity while maintaining high performance and computational efficiency.

  • Optimization design: Includes empirical optimizations such as expert granularity, shared expert ratios, attention ratios, no auxiliary loss + sigmoid routing strategy, MTP loss, QK-Norm, and semi-RoPE—further enhancing performance and efficiency.

  • FP8 mixed-precision training: Uses FP8 training instead of BF16. In experiments with over 1 trillion tokens, loss curves and downstream benchmark performance are nearly identical. The open-sourced FP8 solution enables the community to perform efficient continued pretraining and fine-tuning with limited compute resources.

  • Multi-stage training: Trained on 20+ trillion high-quality tokens, with multi-stage supervised fine-tuning and reinforcement learning, delivering significant improvements in complex reasoning and instruction adherence.


Project Links


Application Scenarios of Ling-V2

  • Natural Language Processing (NLP) tasks: Efficiently handles text classification, sentiment analysis, machine translation, and more—delivering accurate and high-performance solutions.

  • Intelligent customer service: Serves as the core engine of smart customer service systems, quickly responding to queries with precise answers, enhancing user experience and service efficiency.

  • Content creation: Assists in generating high-quality written content such as news reports, creative writing, and advertising copy, boosting both efficiency and quality for creators.

  • Education: Applied in education for intelligent tutoring, automated grading, and personalized learning plans, offering tailored support for students and teachers.

  • Healthcare: Processes medical text data to assist doctors with case analysis, medical literature retrieval, and more—improving accuracy and efficiency in medical decision-making.

© Copyright Notice

Related Posts

No comments yet...

none
No comments yet...