Qwen3 is Alibaba’s next-generation large language model that supports both “thinking mode” and “non-thinking mode” modes of operation. The thinking mode model performs step-by-step reasoning to provide well-considered answers, making it suitable for complex problems. The non-thinking mode model provides fast, near-instant responses, making it ideal for simple queries. Qwen3 supports 119 languages and dialects, significantly expanding its language capabilities compared to the previous version, which supported only 29 languages. It also enhances encoding and agent capabilities, supporting the MCP protocol for better integration with external tools and data sources. The dataset of Qwen3 consists of about 36 trillion tokens, double the size of Qwen2.5. It uses a four-phase training process, including long thought chain cold start, reinforcement learning for long thought chains, thinking mode fusion, and general reinforcement learning. The Qwen3 series models are open-sourced under the Apache 2.0 license, allowing global developers, research institutions, and enterprises to download and use them commercially.
Main Features of Qwen3
Hybrid Reasoning Modes: Qwen3 supports two modes: “thinking mode” for complex problems and “non-thinking mode” for fast, near-instant responses to simple queries. Users can flexibly control the reasoning process based on the task’s complexity, balancing cost and reasoning quality.
Multilingual Support: Qwen3 supports 119 languages and dialects, including English, French, Simplified and Traditional Chinese, Cantonese, and more, vastly expanding its global application.
Enhanced Agent Capabilities: Qwen3 optimizes encoding and agent capabilities and supports the MCP protocol, enabling efficient interaction with external tools. When combined with the Qwen-Agent framework, it significantly reduces coding complexity and facilitates tasks such as efficient smartphone and computer agent operations.
Multiple Model Configurations: Qwen3 offers a range of model configurations, including two MoE models (Qwen3-235B-A22B and Qwen3-30B-A3B) and six dense models (Qwen3-32B, Qwen3-14B, Qwen3-8B, Qwen3-4B, Qwen3-1.7B, and Qwen3-0.6B), covering scenarios from small devices to large-scale enterprise deployments.
Technical Principles of Qwen3
Large-Scale Pretraining: Qwen3’s pretraining dataset is about 36 trillion tokens, double the size of Qwen2.5, covering 119 languages and dialects. The pretraining process includes three stages:
Stage 1 (S1): Pretrained on over 30 trillion tokens with a context length of 4K tokens, providing the model with basic language skills and general knowledge.
Stage 2 (S2): Focused on knowledge-intensive data (such as STEM, programming, and reasoning tasks), followed by further pretraining on an additional 5 trillion tokens.
Stage 3: Utilized high-quality long-context data to extend the context length to 32K tokens, ensuring the model can effectively handle longer inputs.
Optimized Post-Training: To develop a hybrid model that combines both reasoning and fast response capabilities, Qwen3 implements a four-phase training process:
Long Chain-of-Thought Cold Start: Fine-tunes the model using a variety of long thought chain data, covering math, code, logical reasoning, and STEM tasks.
Reasoning-based Reinforcement Learning: Enhances the model’s exploration and research capabilities using rule-based rewards.
Thinking Mode Fusion: Fine-tunes the model with a combined dataset of long thought chain data and common instruction fine-tuning data, integrating the non-thinking mode into the thinking model.
General Reinforcement Learning: Applies reinforcement learning on over 20 general domain tasks (including instruction following, format following, and agent capabilities) to further enhance the model’s general capabilities and correct undesirable behaviors.
Multiple Model Configurations: Qwen3 offers configurations such as MoE models (Qwen3-235B-A22B and Qwen3-30B-A3B) and dense models (Qwen3-32B, Qwen3-14B, Qwen3-8B, Qwen3-4B, Qwen3-1.7B, and Qwen3-0.6B), catering to a wide range of deployment scenarios.
Performance Optimization: Qwen3’s performance is significantly improved, while deployment costs are greatly reduced. For example, the full-fledged version can be deployed with just 4 H20 cards, occupying only one-third of the memory required by similar performance models.
Qwen3 has shown impressive results in various benchmark tests:
AIME25: Qwen3 scored 81.5, setting a new open-source record.
LiveCodeBench: Qwen3 scored over 70, outperforming Grok3.
ArenaHard: Qwen3 scored 95.6, surpassing OpenAI-o1 and DeepSeek-R1.
Use Cases of Qwen3
Text Generation: Qwen3 can generate coherent, natural long texts, suitable for tasks like automated writing, news generation, and blog article creation. It can generate full articles or stories based on a given prompt.
Machine Translation: Qwen3 supports 119 languages and dialects, excelling in multilingual translation tasks. It can handle translation tasks between various language pairs and provide high-quality translation results.
Legal Document Generation: Qwen3 can generate contracts, legal opinions, litigation documents, and other legal texts. By fine-tuning on legal domain data, Qwen3 can generate legally compliant, formatted documents.
Technical Documentation Writing: Qwen3 can generate detailed technical documents, product specifications, user manuals, and more. Fine-tuned on technical domain data, it helps developers and tech support teams automatically generate industry-standard documentation.
Medical Field: Qwen3 can be used to generate medical reports, diagnostic suggestions, and other healthcare-related documents. Fine-tuned on medical literature and medical records, it can assist doctors in generating case notes during diagnosis.
Legal Field: Qwen3 can generate highly specialized legal documents by fine-tuning on legal literature, case law, and regulations.